Job datamodel
jadice flow is designed in a way that processing documents and document data is broken down into jobs.
A job is defined by a JobTemplate and can consist of multiple steps resulting combined in a workflow (sequential or conditional/complex). Individual processing steps in a workflow are executed by the remote workers. For example, one worker may convert Office documents using LibreOffice, while another worker transforms documents in other formats using jadice-native functionalities - and finally the results may be merged into a single PDF output document.
More details on the configuration of the JobTemplates can be found in the chapter workflows.
JobRequest
JobRequests are used to start work on the controller. The controller uses a job queue to process requests. A priority can be assigned to a request, so it might overtake other requests with lower priority in the queue.
Most essential is the jobTemplateName, which defines which workflow shall be used. Apart from that, the request contains the items and parts to be processed.
An item is a logical document. It can consist of multiple part elements. A part is a single image, like a .jpg file.
The following 3 model objects exist:
- Job (runtime) or JobRequest (request phase): A job usually consists of 1-n Item(s).
- Item: The item represents a document. It can have 0-n parts. Additionally, an item contains some meta information in the map
processingProperties
- this map contains values relevant for the workflow and workers themselves. Additionally, "index values" can be provided in the Item'sIndexData
. This map contains attributes relevant for the document; e.g. customer-Number or other values which can be relevant when processing the item. - Part: A single data stream with data to process. Parts have a type -
BASE_PART
for "active" parts which are relevant for the workflow. Parts which have been processed (e.g. an Image has been converted and the result Image is returned) can be marked asMETA
- these Meta-Parts will by default not be handled by following worker Steps.
When sending a JobRequest
, the request contains the Item
(with Part
) element(s) to process.
A jobRequest in XML Format looks like this:
<JobRequest>
<id>42f997fc-d085-4d9b-968b-3d73370412a3</id>
<jobTemplateName>ImageToPdfWithHocr</jobTemplateName>
<creatorName />
<jobParameterMap />
<priority>0</priority>
<items>
<items>
<processingProperties />
<indexData />
<parts>
<parts>
<url>https://valid-url-to-file.jpg</url>
<filename>car.jpg</filename>
<type>BASE_PART</type>
<mimeType>image/jpeg</mimeType>
<processingProperties />
</parts>
</parts>
</items>
</items>
</JobRequest>
This request will issue the file "car.jpg" to be processed in the controller with the job template "ImageToPdfWithHocr". The URL in the request must be valid, meaning that the file should already be uploaded to the storage. When using a component like the SourceScanner, such XML like above could be polled regularly from a directory and upload the files in the process (url can be empty in that case, the source scanner will perform the upload based on the filename field).
After issuing a jobRequest to the controller, an ID for the corresponding job is returned. In some cases also named job-ID or jobExecution-ID.
Job execution logic
When a job runs, the items of the job will go from step to step according to the workflow . Steps can create new parts as a result of a image conversion, or perform actions on the item meta data.
When processing an item for a step, the step usually only processes the parts not marked as type META.
The usual scenario for an image conversion is (consider 1 item with 1 jpg part):
- The item is being processed by the ToPDF step. It converts the JPG to a PDF
- This result is added to the item as a new part
- Depending on the configuration, the source part can be marked as META, which prevents further steps to process the part again. In an image conversion, the source part usually is marked as META so only the converted result image is further processed.
- The item has 2 parts now: 1 the JPG marked as META and 2nd the PDF marked as BASE_PART
- When calling the convenience method "getResultPartsForJob", the result would only be the PDF part (to get all parts including META, the "getItemsForJob" Rest endpoint can be used).
JobTemplate
A jobTemplate contains the configuration for the workflow. It can consist of multiple stepTemplates, which define the configuration for a single step/worker.
Each steps can have some individual configurations, like PDF conformance or resolution settings. These can be set in the stepTemplate.
A workflow can be a simple list of steps which are executed sequentially like in this example. More complex workflows with flow decisions are also supported. See the chapter Workflows for more information on jobTemplates.