Workflows
Configuration objects
Ready-to-deploy configuration
A default workflow configuration is with a jobtemplate.yaml already part of the products. In advanced use cases it might be necessary to modify job templates or change the configuration.
Basics
To define workflows, jadice flow uses so called JobTemplates
. A JobTemplate
can consist of 1-n StepTemplates
which define which steps to run and which configuration to use.
A StepTemplate
is a single piece of work inside a job, e.g. "Decompress ZIP", or "Convert JPEG to PDF"
The example workflow "image-to-pdf-with-hocr" looks like this:
Steps can have additional configuration parameters - this is an example from the OCR-step in the UI:
The default configuration parameters for any step are explained in more detail in the following chapters.
Defining a JobTemplate
A JobTemplate
should contain at least one step.
Each JobTemplate
has the following default configuration:
enabled
: Job templates can be disabled. Disabled templates are not available in the list of job templates for the user in the UI and requests which correspond to a disabled template will lead to an error message for the Rest-Caller.Step Name
: Name of the step in the workflow / JobTemplateWorker Name
: Name of the Worker for this step. The list of available workers is determined by theworkers.yaml
configuration file. TheWorker Name
must match an existing worker configuration from the configuration file.Fail on missing Result
: Many workers (e.g. ImageConversion) are expected to return a result and should use this configuration option. If a worker does not return a new data stream result, the Job would fail in such case.Mark src as meta on result
: Whether to mark the source part which was sent to the worker asMETA
, indicating that the part has been processed. This is an important mechanism to understand: by default, a META part will not be processed by further steps. So if a source image part has been marked as META, it will no longer be processed by following workers.Input mime type
deprecated: An optional filter which can be set for the step. If one or more mime types are entered here, only parts / items which have the corresponding mime type will be sent to the worker.Step input filter
: An optional filter which can be set for the step. There are multiple type of filters available to filter elements by a specific property or URL for example. More details on this in the "Filters" chapter.
Many steps have additional configuration parameters, like the OCR-output mode in the ocr worker. The possible worker configuration parameters are defined in the workers.yaml file with their descriptions and default values. In the jobtemplates.yaml, which contains the specific configured jobTemplates, the parameters are also present with their specific values for the job template.
JobTemplates can be configured via the UI which utilizes the Rest endpoints to create/modify a JobTemplate
. Another option is to directly enter the template in the jobTemplates.yaml
(only recommended for advanced users).
Job flow
When defining a JobTemplate
, the job flow has to be defined. The UI provides a convenience method to create a simple default sequential flow of the available templates.
A sequential flow is usually sufficient for many use cases, especially if there are not many steps. However, in some cases it might be interesting to define a more complex workflow with conditions.
Items to be processed are pushed through the workflow and the configured steps will perform the work.
It is possible to set Step Input Filters
to filter elements. More on this later in the Filters chapter.
Simple sequential flow
When defining a sequential flow via the UI or yaml, the steps are added as Nodes
to a graph. A Node has a successor, a predecessor and a condition, in which the successor shall be used.
A special case is the first node, which does not have a value for the from
-Field.
Example sequential flow:
jobFlow:
- from: null
"on": "*"
to: "filetypeAnalyzer"
- from: "filetypeAnalyzer"
"on": "COMPLETED"
to: "ocr"
- from: "ocr"
"on": "COMPLETED"
to: "pdf"
This example is a JobTemplate
with 3 sequential steps:
filetypeAnalyzer
: Analyzes the input file (start node syntax: from="", on="*", to="filetypeAnalyzer"). Sets thecontains-text
property, if stream analysis is enabled in the worker.ocr
: Performs optical character recognition for input images, if thecontains-text
property is not set orfalse
.pdf
: Converts images to PDF
Steps usually have an exit status of COMPLETED or FAILED after they ran. Some steps might also provide a different exit status. This is generally just a String which is used to find the next node.
When defining on=COMPLETED we make sure that ocr-step was successful (or skipped).
The filter-check
step will return an exit code of TRUE or FALSE, depending whether its configured filter is matched or not. More details in the following chapter.
Filters
Filters
can check various aspects of an item/part to check if the element should be processed or skipped. If an element is skipped by a step, this is noted in the item or part history with a SKIPPED entry. Otherwise, those elements should go to the PROCESSED state after work.
Filters can be used in 2 ways:
- Set as a step input filter - items to be processed are checked by the filter. If no filter exists, all elements are relevant.
- Use the
filter-check
local worker step in a dynamic job flow for job flow decisions. This step returnstrue
orfalse
, depending if an element passed the configured filter. More details on this in theConditional flow
chapter.
Usually, multiple filters can be added to a list.
Available filter Types:
ITEM_INDEX_FILTER
: Checks the index values of an item (item index data).ITEM_PROPERTY_FILTER
: Checks the item's processing propertiesPART_MIME_TYPE_FILTER
: Checks the Part mime typePART_FILENAME_FILTER
: Checks the Part filenamePART_HASURL_FILTER
: Checks if the part has a non null URLPART_PROPERTY_FILTER
: Checks a specific property of a partPART_TYPE_FILTER
: Checks the part typePART_URL_FILTER
: Checks the part URL
Filters which filter a property can use a Matcher
to match the property value for the filter.
Matchers
Some Filters
use a Matcher to check a specific property value. Since the property values are usually a simple String value, they can be checked in different ways.
Depending on the specific property, the appropriate Matcher can be chosen to check the value.
Available matcher types:
NON_NULL
: Checks if the entry is non null. Json type nameNonNullMatcher
.BOOLEAN
: Interprets the value as boolean and checks vs the expected value. Json type nameBooleanMatcher
.NUMBER
: Interprets the value as number and allows further sub-checks (EQUALS, GREATER, LESSER, NOT). Json type nameNumberMatcher
.REGEX
: Checks if the value matches a given regular expression. Json type nameRegexMatcher
.LITERAL_STRING
: Allows further sub-checks on a literal string value (EQUALS, STARTS_WITH, ENDS_WITH, CONTAINS). Json type nameStringMatcher
.
Example of a part property filter using a string matcher to check that the part URL starts with https://
:
Example string matcher (json format)
{
"type": "PartUrlFilter",
"inclusive": true,
"matcher": {
"type": "StringMatcher",
"mode": "STARTS_WITH",
"string": "https://",
"ignoreCase": true
}
}
See also the examples section with filter examples where also different matchers are used.
Conditional flow
Apart from sequential flows, it is possible to define more complex flows using conditions.
As an example, we consider the following flow graph:
This is the corresponding yaml section:
jobFlow:
- from: ""
"on": "*"
to: "Filetype"
- from: "Filetype"
"on": "COMPLETE"
to: "HasZIP"
- from: "HasZIP"
"on": "true"
to: "Decompress"
- from: "HasZIP"
"on": "false"
to: "ToPDF"
- from: "Decompress"
"on": "COMPLETE"
to: "Filetype"
This job has the following steps:
- Start node:
Filetype
to analyze the mime type of parts HasZIP
is a special step which uses thefilter-check
step. For example, this step allows to set a mime type filter to include / exclude specific mime types like ZIP archives. If theapplication/zip
-formats are added to the configuration of that step, its return status will betrue
orfalse
depending on whether such a mime type is present in the item. This step will not exit with COMPLETED or FAILED. This also means, that after a decision step, both routes for the exit valuestrue
orfalse
should have a follow-up step. This is most likely the case anyway, since if there was only one follow-up step, no decision would be needed (maybe only a filter on the step itself).- In the Node-Graph, the
Decompress
step is only called if the item really has a ZIP. After decompressing, the flow moves back to theFileType
step, so the extracted files also run through the flow.
A HasZIP
step can look like
This example job template could be used to recursivly extract a ZIP containing further ZIPs.
Examples
jobTemplates.yaml example
On startup of the flow controller, the defined JobTemplates
from the jobTemplates.yaml
will be read.
The jobtemplate.yml below has a Job with the name "image-to-pdf-with-hocr". It converts an image to PDF/A 2b with hoCR layer.
The jobFlow
defines the job flow graph (more details).
Example (yaml format)
---
jadice-flow.jobs:
jobTemplates:
- jobName: "image-to-pdf-with-hocr"
description: "Convert an image to PDF/A 2b with hOCR layer"
properties: {}
enabled: false
stepTemplates:
- stepName: "filetypeAnalyzer"
workerDefinitionName: "analyzer"
inputMimeTypes: []
inputFilterJson: ""
expectsNewPartResult: false
markSrcAsMetaOnResult: false
parameters:
- name: "forced.types"
type: "[Ljava.lang.String;"
subTypes: []
value: null
description: "Forces recognition of the given mime types"
- stepName: "ocr"
workerDefinitionName: "tessocr"
inputMimeTypes: []
inputFilterJson: ""
expectsNewPartResult: true
markSrcAsMetaOnResult: false
parameters:
- name: "output-formats"
type: "com.jadice.flow.worker.ocr.OCROutputSetting"
subTypes: []
value: "\"HOCR\""
description: "OCR output format(s)"
- stepName: "pdf"
workerDefinitionName: "combine-to-pdf"
inputMimeTypes: []
inputFilterJson: ""
expectsNewPartResult: false
markSrcAsMetaOnResult: true
parameters:
- name: "pdfaConformanceLevel"
type: "com.jadice.flow.worker.topdf.settings.Conformance"
subTypes: []
value: "\"PDFA2b\""
description: "PDF-A conformance level"
- name: "processingStepSettings"
type: "com.jadice.flow.worker.topdf.settings.ProcessingStepSettingsDTO"
subTypes: []
value: "{\"repackingAllowed\":false,\"genericProcessingAllowed\":true,\"rasterizationAllowed\"\
:true}"
description: "PDF export settings"
- name: "pdfStructureReaderSettings"
type: "com.jadice.flow.worker.topdf.settings.PDFStructureReaderSettingsDTO"
subTypes: []
value: "{\"readStrategy\":\"STRICT\"}"
description: "PDF reader settings. LENIENT-Mode enables to read some documents\
\ with structural defects"
- name: "outputMode"
type: "com.jadice.flow.worker.topdf.settings.OutputMode"
subTypes: []
value: "\"LAYERED\""
description: "PDF export output mode. Generate one stream per page or join\
\ all incoming documents together."
jobFlow:
- from: null
"on": "*"
to: "filetypeAnalyzer"
- from: "filetypeAnalyzer"
"on": "COMPLETED"
to: "ocr"
- from: "ocr"
"on": "COMPLETED"
to: "pdf"
Filter examples (only filter part of job template):
Example for a jobTemplates.yaml:
yaml example (filters)
---
jadice-flow.jobs:
jobTemplates:
- jobName: "StepFilterTest"
...
stepTemplates:
- stepName: "test"
...
inputFilterJson: "[{\"type\":\"ItemPropertyFilter\",\"inclusive\":true,\"propertyKey\"\
:\"key\",\"matcher\":{\"type\":\"NumberMatcher\",\"type\":\"NumberMatcher\"\
,\"mode\":\"NOT\",\"value\":\"42\"}}]"
Note:
In the following json examples, the "
character is not escaped for better readability. In a jobTemplates.yaml
configuration file this is required like in the previous example.
When using copy+paste from this examples to enter in the inputFilterJson
field directly, you should first use a text editor to replace "
with \"
.
Part Property Filter
Filter name: PART_PROPERTY_FILTER
Json Type Name: PartPropertyFilter
Description: Filters parts based on a value from the processing property map of the part.
Uses a selectable Matcher
to check the value.
PartPropertyFilter json example
{
"type": "PartPropertyFilter",
"inclusive": false,
"propertyKey": "contains-text",
"matcher": {
"type": "BooleanMatcher",
"value": true
}
}
This filter EXCLUDES parts which have the contains-text
property set. Can be used to filter items which do not need to be passed to OCR, for example. (The OCR step has a filter to the contains-text property also internally, so the filter is not strictly required to be present in the configuration).
Part URL Filter
Filter name: PART_URL_FILTER
Json Type Name: PartUrlFilter
Description: Uses a matcher to check the part's URL.
Uses a selectable Matcher
to check the value.
PartUrlFilter json example
{
"type": "PartUrlFilter",
"inclusive": true,
"matcher": {
"type": "RegexMatcher",
"regex": "https://.*"
}
}
Part Has-URL Filter
Filter name: PART_HAS_URL_FILTER
Json Type Name: PartHasUrlFilter
Description: Checks if the part has a non-null URL.
PartHasUrlFilter json example
{
"type": "PartHasUrlFilter",
"inclusive": true
}
Part Mime type filter
Filter name: PART_MIME_TYPE_FILTER
Json Type Name: MimeTypeFilter
Description: Checks the part mime type. A list of mime types can be set. It is also possible to enter wildcard values like image/*
.
MimeTypeFilter json example
{
"type": "MimeTypeFilter",
"inclusive": true,
"mimeTypes": [
"image/*",
"application/pdf"
]
}
Item Index Filter
Filter name: ITEM_INDEX_FILTER
Json Type Name: ItemIndexFilter
Description: Filters items based on a value from the "index data" property map of the item.
Uses a selectable Matcher
to check the value.
ItemIndexFilter json example
{
"type": "ItemIndexFilter",
"inclusive": true,
"propertyKey": "testKey",
"matcher": {
"type": "StringMatcher",
"mode": "EQUALS",
"string": "test",
"ignoreCase": true
}
}
Item Property Filter
Filter name: ITEM_PROPERTY_FILTER
Json Type Name: ItemPropertyFilter
Description: Filters items based on a value from the processing property map of the item.
Uses a selectable Matcher
to check the value.
ItemPropertyFilter json example
{
"type": "ItemPropertyFilter",
"inclusive": true,
"propertyKey": "testKey",
"matcher": {
"type": "NumberMatcher",
"mode": "GREATER",
"value": "42"
}
}