Workers
Workers are separate processes which perform a work task.
Existing Workers
For more information about each worker such as descriptions, execution and configuration settings or examples, click at the corresponding worker in the list below.
PDF
- worker-topdf - Converts some input documents to PDF
- worker-htmltopdf - Converts html format documents into PDF
- worker-pdfverifier - Validates PDFs based on a selected PDF/a standard and PDF/a conformance level
- worker-pdfimageoptimization - Processes PDF files and shrinks the size of the images
- worker-pdfpermissions - Set or remove passwords on PDF documents.
Office
- worker-libreoffice - Converts office documents to PDF
- worker-msoffice - Converts microsoft office documents to PDF
Input / Output
- worker-output-s3 - Provides S3 URLs for all result files
- worker-oam-input - Read document data from a OAM Database
Email
- worker-emaildisassembly - Disassembled an email into its different parts
- worker-tnef - Converts TNEF and MSG files into RFC 822 messages
- worker-mailbodycreator - Converts email metadata to a PDF cover page and converts the HTML/RTF/Plaintext-Bodies to PDF
Analyze files
- worker-analyzer - Analyzes a stream's content and tries to determine the content's MIME type
AFP Documents
- worker-afp-converter - Converts AFP documents to PDF
- worker-afp-stream-split - Split huge combined AFP document streams into small streams
- worker-resource-bundler - Bundles separate resources to a resource group file
HTML / XML / XSL / Markdown
- worker-xslfo - Applies a XSL stylesheet and formats a PDF based on XSL:FO
- worker-pandoc - Converts several document formats to Markdown via Pandoc
Extraction
- worker-document-content-extractor - Extracts the content and structural information from of a PDF
- worker-textextractor - Extracts texts from documents
Images
- worker-totiff - Converts input documents to TIFF
- worker-tessocr - Performs OCR analysis on TIFF and PDF documents and returns an enriched PDF, text or hOCR
- worker-pageimage - Generates an image for every page in the source document
- worker-imagemagick - Uses ImageMagick Studio LLC to convert Heic/Heif images to PNG
Text / content preparation
- worker-summarizer - Reads files in format txt and stores the summary as txt
Decompress
- worker-decompress - Decompresses compressed files into their original state
Annotations
- worker-anno-migrator - Migrates annotations that match a given document to another format matching a target document.
Additional configuration
Workers are usually implemented as spring boot application and run inside a container. Via the application.yaml
configuration file, settings can be changed.
Some workers may require access to external resources or custom configuration files on the classpath of the application. For these cases, the following configuration methods exist:
Adding external fonts
In the application.yaml
, the external font path can be configured:
jadice.extFontPath: /app/fonts
If this folder exists (e.g. mounted in the worker container), it will be used by the jadice font manager to read compatible font resources.
Adding a additional configuration file
Some workers may require additional configuration files placed in the classpath. Additional files must be placed in the following directory:
/app/libs/custom/application/
Adding a custom library to the application
Some workers may require additional JARs like JDBC drivers or specific client connection libraries. In general, external JARs must be placed in the folder:
/app/libs/custom/
JAR files mounted into this directory will be used when starting the application. Please refer to the specific worker documentation if specific files need to be placed here. In most cases, this is not required.