Skip to main content
Version: Next

Workflows

Configuration objects

Ready-to-deploy configuration

A default workflow configuration is with a jobtemplate.yaml already part of the products. In advanced use cases it might be necessary to modify job templates or change the configuration.

Basics

To define workflows, jadice flow uses so called JobTemplates. A JobTemplate can consist of 1-n StepTemplates which define which steps to run and which configuration to use.

High level architecture

A StepTemplate is a single piece of work inside a job, e.g. "Decompress ZIP", or "Convert JPEG to PDF"

The example workflow "image-to-pdf-with-hocr" looks like this: High level architecture

Steps can have additional configuration parameters - this is an example from the OCR-step in the UI: High level architecture

The default configuration parameters for any step are explained in more detail in the following chapters.

Defining a JobTemplate

A JobTemplate should contain at least one step.

Each JobTemplate has the following default configuration:

  • enabled: Job templates can be disabled. Disabled templates are not available in the list of job templates for the user in the UI and requests which correspond to a disabled template will lead to an error message for the Rest-Caller.
  • Step Name: Name of the step in the workflow / JobTemplate
  • Worker Name: Name of the Worker for this step. The list of available workers is determined by the workers.yaml configuration file. The Worker Name must match an existing worker configuration from the configuration file.
  • Fail on missing Result: Many workers (e.g. ImageConversion) are expected to return a result and should use this configuration option. If a worker does not return a new data stream result, the Job would fail in such case.
  • Mark src as meta on result: Whether to mark the source part which was sent to the worker as META, indicating that the part has been processed. This is an important mechanism to understand: by default, a META part will not be processed by further steps. So if a source image part has been marked as META, it will no longer be processed by following workers.
  • Input mime type deprecated: An optional filter which can be set for the step. If one or more mime types are entered here, only parts / items which have the corresponding mime type will be sent to the worker.
  • Step input filter: An optional filter which can be set for the step. There are multiple type of filters available to filter elements by a specific property or URL for example. More details on this in the "Filters" chapter.

Many steps have additional configuration parameters, like the OCR-output mode in the ocr worker. The possible worker configuration parameters are defined in the workers.yaml file with their descriptions and default values. In the jobtemplates.yaml, which contains the specific configured jobTemplates, the parameters are also present with their specific values for the job template.

JobTemplates can be configured via the UI which utilizes the Rest endpoints to create/modify a JobTemplate. Another option is to directly enter the template in the jobTemplates.yaml (only recommended for advanced users).

Job flow

When defining a JobTemplate, the job flow has to be defined. The UI provides a convenience method to create a simple default sequential flow of the available templates.

A sequential flow is usually sufficient for many use cases, especially if there are not many steps. However, in some cases it might be interesting to define a more complex workflow with conditions.

Items to be processed are pushed through the workflow and the configured steps will perform the work.

It is possible to set Step Input Filters to filter elements. More on this later in the Filters chapter.

Simple sequential flow

When defining a sequential flow via the UI or yaml, the steps are added as Nodes to a graph. A Node has a successor, a predecessor and a condition, in which the successor shall be used.

Basic system overview

A special case is the first node, which does not have a value for the from-Field. Example sequential flow:

    jobFlow:
- from: null
"on": "*"
to: "filetypeAnalyzer"
- from: "filetypeAnalyzer"
"on": "COMPLETED"
to: "ocr"
- from: "ocr"
"on": "COMPLETED"
to: "pdf"

This example is a JobTemplate with 3 sequential steps:

  1. filetypeAnalyzer: Analyzes the input file (start node syntax: from="", on="*", to="filetypeAnalyzer"). Sets the contains-text property, if stream analysis is enabled in the worker.
  2. ocr: Performs optical character recognition for input images, if the contains-text property is not set or false.
  3. pdf: Converts images to PDF

Steps usually have an exit status of COMPLETED or FAILED after they ran. Some steps might also provide a different exit status. This is generally just a String which is used to find the next node.

When defining on=COMPLETED we make sure that ocr-step was successful (or skipped).

The filter-check step will return an exit code of TRUE or FALSE, depending whether its configured filter is matched or not. More details in the following chapter.

Filters

Filters can check various aspects of an item/part to check if the element should be processed or skipped. If an element is skipped by a step, this is noted in the item or part history with a SKIPPED entry. Otherwise, those elements should go to the PROCESSED state after work.

Filters can be used in 2 ways:

  • Set as a step input filter - items to be processed are checked by the filter. If no filter exists, all elements are relevant.
  • Use the filter-check local worker step in a dynamic job flow for job flow decisions. This step returns true or false, depending if an element passed the configured filter. More details on this in the Conditional flow chapter.

Usually, multiple filters can be added to a list.

Available filter Types:

  • ITEM_INDEX_FILTER: Checks the index values of an item (item index data).
  • ITEM_PROPERTY_FILTER: Checks the item's processing properties
  • PART_MIME_TYPE_FILTER: Checks the Part mime type
  • PART_FILENAME_FILTER: Checks the Part filename
  • PART_HASURL_FILTER: Checks if the part has a non null URL
  • PART_PROPERTY_FILTER: Checks a specific property of a part
  • PART_TYPE_FILTER: Checks the part type
  • PART_URL_FILTER: Checks the part URL

Filters which filter a property can use a Matcher to match the property value for the filter.

Matchers

Some Filters use a Matcher to check a specific property value. Since the property values are usually a simple String value, they can be checked in different ways.

Depending on the specific property, the appropriate Matcher can be chosen to check the value.

Available matcher types:

  • NON_NULL: Checks if the entry is non null. Json type name NonNullMatcher.
  • BOOLEAN: Interprets the value as boolean and checks vs the expected value. Json type name BooleanMatcher.
  • NUMBER: Interprets the value as number and allows further sub-checks (EQUALS, GREATER, LESSER, NOT). Json type name NumberMatcher.
  • REGEX: Checks if the value matches a given regular expression. Json type name RegexMatcher.
  • LITERAL_STRING: Allows further sub-checks on a literal string value (EQUALS, STARTS_WITH, ENDS_WITH, CONTAINS). Json type name StringMatcher.

Example of a part property filter using a string matcher to check that the part URL starts with https://:

Example string matcher (json format)
{
"type": "PartUrlFilter",
"inclusive": true,
"matcher": {
"type": "StringMatcher",
"mode": "STARTS_WITH",
"string": "https://",
"ignoreCase": true
}
}

See also the examples section with filter examples where also different matchers are used.

Conditional flow

Apart from sequential flows, it is possible to define more complex flows using conditions.

As an example, we consider the following flow graph:

Basic system overview

This is the corresponding yaml section:

    jobFlow:
- from: ""
"on": "*"
to: "Filetype"
- from: "Filetype"
"on": "COMPLETE"
to: "HasZIP"
- from: "HasZIP"
"on": "true"
to: "Decompress"
- from: "HasZIP"
"on": "false"
to: "ToPDF"
- from: "Decompress"
"on": "COMPLETE"
to: "Filetype"

This job has the following steps:

  • Start node: Filetype to analyze the mime type of parts
  • HasZIP is a special step which uses the filter-check step. For example, this step allows to set a mime type filter to include / exclude specific mime types like ZIP archives. If the application/zip-formats are added to the configuration of that step, its return status will be true or false depending on whether such a mime type is present in the item. This step will not exit with COMPLETED or FAILED. This also means, that after a decision step, both routes for the exit values true or false should have a follow-up step. This is most likely the case anyway, since if there was only one follow-up step, no decision would be needed (maybe only a filter on the step itself).
  • In the Node-Graph, the Decompress step is only called if the item really has a ZIP. After decompressing, the flow moves back to the FileType step, so the extracted files also run through the flow.

A HasZIP step can look like Basic system overview

This example job template could be used to recursivly extract a ZIP containing further ZIPs.

Examples

jobTemplates.yaml example

On startup of the flow controller, the defined JobTemplates from the jobTemplates.yaml will be read.

The jobtemplate.yml below has a Job with the name "image-to-pdf-with-hocr". It converts an image to PDF/A 2b with hoCR layer.

The jobFlow defines the job flow graph (more details).

Example (yaml format)
---
jadice-flow.jobs:
jobTemplates:
- jobName: "image-to-pdf-with-hocr"
description: "Convert an image to PDF/A 2b with hOCR layer"
properties: {}
enabled: false
stepTemplates:
- stepName: "filetypeAnalyzer"
workerDefinitionName: "analyzer"
inputMimeTypes: []
inputFilterJson: ""
expectsNewPartResult: false
markSrcAsMetaOnResult: false
parameters:
- name: "forced.types"
type: "[Ljava.lang.String;"
subTypes: []
value: null
description: "Forces recognition of the given mime types"
- stepName: "ocr"
workerDefinitionName: "tessocr"
inputMimeTypes: []
inputFilterJson: ""
expectsNewPartResult: true
markSrcAsMetaOnResult: false
parameters:
- name: "output-formats"
type: "com.jadice.flow.worker.ocr.OCROutputSetting"
subTypes: []
value: "\"HOCR\""
description: "OCR output format(s)"
- stepName: "pdf"
workerDefinitionName: "combine-to-pdf"
inputMimeTypes: []
inputFilterJson: ""
expectsNewPartResult: false
markSrcAsMetaOnResult: true
parameters:
- name: "pdfaConformanceLevel"
type: "com.jadice.flow.worker.topdf.settings.Conformance"
subTypes: []
value: "\"PDFA2b\""
description: "PDF-A conformance level"
- name: "processingStepSettings"
type: "com.jadice.flow.worker.topdf.settings.ProcessingStepSettingsDTO"
subTypes: []
value: "{\"repackingAllowed\":false,\"genericProcessingAllowed\":true,\"rasterizationAllowed\"\
:true}"
description: "PDF export settings"
- name: "pdfStructureReaderSettings"
type: "com.jadice.flow.worker.topdf.settings.PDFStructureReaderSettingsDTO"
subTypes: []
value: "{\"readStrategy\":\"STRICT\"}"
description: "PDF reader settings. LENIENT-Mode enables to read some documents\
\ with structural defects"
- name: "outputMode"
type: "com.jadice.flow.worker.topdf.settings.OutputMode"
subTypes: []
value: "\"LAYERED\""
description: "PDF export output mode. Generate one stream per page or join\
\ all incoming documents together."
jobFlow:
- from: null
"on": "*"
to: "filetypeAnalyzer"
- from: "filetypeAnalyzer"
"on": "COMPLETED"
to: "ocr"
- from: "ocr"
"on": "COMPLETED"
to: "pdf"

Filter examples (only filter part of job template):

Example for a jobTemplates.yaml:

yaml example (filters)

---
jadice-flow.jobs:
jobTemplates:
- jobName: "StepFilterTest"
...
stepTemplates:
- stepName: "test"
...
inputFilterJson: "[{\"type\":\"ItemPropertyFilter\",\"inclusive\":true,\"propertyKey\"\
:\"key\",\"matcher\":{\"type\":\"NumberMatcher\",\"type\":\"NumberMatcher\"\
,\"mode\":\"NOT\",\"value\":\"42\"}}]"

Note: In the following json examples, the " character is not escaped for better readability. In a jobTemplates.yaml configuration file this is required like in the previous example. When using copy+paste from this examples to enter in the inputFilterJson field directly, you should first use a text editor to replace " with \".

Part Property Filter

Filter name: PART_PROPERTY_FILTER

Json Type Name: PartPropertyFilter

Description: Filters parts based on a value from the processing property map of the part.

Uses a selectable Matcher to check the value.

Filter-example

PartPropertyFilter json example

{
"type": "PartPropertyFilter",
"inclusive": false,
"propertyKey": "contains-text",
"matcher": {
"type": "BooleanMatcher",
"value": true
}
}

This filter EXCLUDES parts which have the contains-text property set. Can be used to filter items which do not need to be passed to OCR, for example. (The OCR step has a filter to the contains-text property also internally, so the filter is not strictly required to be present in the configuration).

Part URL Filter

Filter name: PART_URL_FILTER

Json Type Name: PartUrlFilter

Description: Uses a matcher to check the part's URL.

Uses a selectable Matcher to check the value.

PartUrlFilter json example

{
"type": "PartUrlFilter",
"inclusive": true,
"matcher": {
"type": "RegexMatcher",
"regex": "https://.*"
}
}

Part Has-URL Filter

Filter name: PART_HAS_URL_FILTER

Json Type Name: PartHasUrlFilter

Description: Checks if the part has a non-null URL.

PartHasUrlFilter json example

{
"type": "PartHasUrlFilter",
"inclusive": true
}

Part Mime type filter

Filter name: PART_MIME_TYPE_FILTER

Json Type Name: MimeTypeFilter

Description: Checks the part mime type. A list of mime types can be set. It is also possible to enter wildcard values like image/*.

Filter-example

MimeTypeFilter json example

{
"type": "MimeTypeFilter",
"inclusive": true,
"mimeTypes": [
"image/*",
"application/pdf"
]
}

Item Index Filter

Filter name: ITEM_INDEX_FILTER

Json Type Name: ItemIndexFilter

Description: Filters items based on a value from the "index data" property map of the item.

Uses a selectable Matcher to check the value.

ItemIndexFilter json example

{
"type": "ItemIndexFilter",
"inclusive": true,
"propertyKey": "testKey",
"matcher": {
"type": "StringMatcher",
"mode": "EQUALS",
"string": "test",
"ignoreCase": true
}
}

Item Property Filter

Filter name: ITEM_PROPERTY_FILTER

Json Type Name: ItemPropertyFilter

Description: Filters items based on a value from the processing property map of the item.

Uses a selectable Matcher to check the value.

ItemPropertyFilter json example

{
"type": "ItemPropertyFilter",
"inclusive": true,
"propertyKey": "testKey",
"matcher": {
"type": "NumberMatcher",
"mode": "GREATER",
"value": "42"
}
}