Local Docker Compose Deployment for OCR
In this tutorial, we will run a jadice flow OCR worker in a local Docker environment via docker compose.
For example, this setup can be used to handle ad-hoc OCR requests by a jadice web toolkit integration (see jadice web toolkit OCR Addon).
Prerequisites
Docker installation, such as
Packages for other operating systems and instructions on how to install Docker can be found at docker.com.
We use the docker compose  command to run the services later on.
To use the jadice flow components, a security token is needed (JADICE-FLOW-ACCESS-TOKEN).
You should get a flow access token together with your license. For testing purposes, a test license can also be obtained by sending a request to jadice-support@levigo.de.
Check your access to https://artifacts.jadice.com/ and https://registry.jadice.com/. Images will be pulled from there. The "start-compose"-scripts we use later perform a docker login to the jadice registry.
How to complete this tutorial
You can start from scratch and create the complete configuration, or you can get the full example code and apply only the necessary changes. The docker type deployment is useful for development or quick test setups. For production use, a container management system like kubernetes is strongly recommended.
To start from scratch, move on to Configuration.
To skip the basics, do the following:
- Download and unzip the source repository for this tutorial, or clone it using Git:
git clone https://github.com/levigo/jadice-flow-getting-started.git
- cd into jadice-flow-getting-started/jadice-flow-tutorial-01/docker-compose/local-ocr/
- in the file .envreplace- JADICE-FLOW-ACCESS-TOKENwith your access token
 
- Jump ahead to Startup.
Configuration
In the following steps we create a configuration to provide the required services:
- jadice flow controller (jf-controller)
- Storage (eureka): The storage is used to store the input images for the OCR operations as well as their results.
- A worker configuration, here: jadice flow OCR Worker (jadice-flow-worker-tessocr)
Depending on the workers required for the specific tasks, more services may be required.
Creating the configuration folder
It is good practice to start by creating a directory where the configuration files will be placed. For example, on windows the path
could be C:\Docker-Compose\JadiceFlow\local-ocr .
Common practice is to create a specific folder for each service within the main configuration folder.
For example, controller-config/application.yml provides the configuration for the jadice flow controller.
In the docker-compose.yml, this configuration folder is mounted into the container jf-controller as a docker volume.
In general, all services in docker-compose.yml can be configured in this manner, as we show in the subsequent paragraphs.
Creating a start script
Create a .env file with the required environment variables and then start the services by calling docker compose up.
In this example, we must set following variables:
- JF_CONTAINER_REGISTRY_JADICE- Path to the jadice container registry (registry.jadice.com)
- JF_ACCESS_TOKEN- An access token used by the controller service for authentication towards the worker
- EUREKA_USERNAME- The username for the eureka storage
- EUREKA_PASSWORD- The password for the eureka storage
- COMPOSE_CONVERT_WINDOWS_PATHS- for host paths when running with Docker Desktop on Microsoft Windows
Create the file .env
.env
# container registries
JF_CONTAINER_REGISTRY_JADICE=registry.jadice.com/
# controller
JF_ACCESS_TOKEN=THE-[JADICE-FLOW-ACCESS-TOKEN]
# storage: eureka
EUREKA_USERNAME=user
EUREKA_PASSWORD=password
# for running with Docker Desktop on Microsoft Windows
COMPOSE_CONVERT_WINDOWS_PATHS = "1"
Replace [JADICE-FLOW-ACCESS-TOKEN] with your access token.
Finally, we can start the server by executing a script. During execution a login to the docker registry may be required.
Sample start scripts for Windows and Linux as follows.
Windows
Create the file start-compose.cmd:
start-compose.cmd
@echo off
echo Starting docker compose for jadice flow with OCR
echo Login to levigo container registry
docker login registry.jadice.com
IF NOT EXIST eureka-data mkdir eureka-data
docker compose --env-file .env up
Linux
Create the file start-compose.sh
start-compose.sh
#!/usr/bin/env bash
set -eu pipefail ;
_login_docker_registry(){
  echo ">>>[start-compose] login to levigo container registry" ;
  docker login registry.jadice.com ;
  return 0 ;
} ;
_configure_container_mounts(){
  echo ">>>[start-compose] configure container mounts" ;
  local _sudo="" ;
  local _uid="$(id -u)" ;
  if [[ ! "${_uid}" == "0" ]] ; then
    _sudo="sudo"
  fi ;
  ${_sudo} mkdir -p ./eureka-data/ ;
  ${_sudo} chown -R ${_uid}:538446 ./controller-config/ ;
  ${_sudo} chown -R ${_uid}:538446 ./worker-config/ ;
  ${_sudo} chown -R ${_uid}:0 ./eureka-config/ ;
  ${_sudo} chown -R ${_uid}:0 ./eureka-data/ ;
  return 0 ;
} ;
_start_docker_compose_stack(){
  echo ">>>[start-compose] start docker-compose stack" ;
  echo ">>>[start-compose] you can follow the logs with 'docker compose logs -f'" ;
  docker compose up -d ;
  return 0 ;
} ;
_main() {
  _login_docker_registry ;
  _configure_container_mounts ;
  _start_docker_compose_stack ;
  return 0 ;
}
_main ;
Create the 'docker-compose.yml'
The docker-compose.yml is the Docker-Compose main configuration file. Create this file in the configuration root folder.
Add the following services to this file:
- jf-controller- jadice flow main service
- Additionally, a worker is required. In the tutorial example, the jadice-flow-worker-tessocrOCR worker is used for OCR.
The service/container names in the docker-compose.yml can be used for network communication between the containers.
Example docker-compose.yml:
docker-compose.yml
---
version: "2.4"
networks:
  jadice-flow-network:
    driver: bridge
services:
  jf-controller:
    mem_limit: "4294967296"
    mem_reservation: 2147483648
    image:  "${JF_CONTAINER_REGISTRY_JADICE}jadice-flow-controller:0.26.5"
    user: '538446:538446'
    networks:
      - jadice-flow-network
    restart: always
    environment:
      JF_ACCESS_TOKEN: ${JF_ACCESS_TOKEN}
      EUREKA_ENDPOINT: http://eureka:8080
      EUREKA_USERNAME: "${EUREKA_USERNAME}"
      EUREKA_PASSWORD: "${EUREKA_PASSWORD}"
    volumes:
      - ./controller-config:/app/config
    ports:
      - "8080:8080"
  jadice-flow-worker-tessocr:
    mem_limit: "8589934592"
    mem_reservation: "4294967296"
    image: "${JF_CONTAINER_REGISTRY_JADICE}jf-worker-tessocr:1.8.0"
    networks:
      - jadice-flow-network
    restart: always
    user: '538446:538446'
    environment:
      EUREKA_ENDPOINT: http://eureka:8080
      EUREKA_USERNAME: "${EUREKA_USERNAME}"
      EUREKA_PASSWORD: "${EUREKA_PASSWORD}"
    volumes:
      - ./worker-config:/app/config
    ports:
      - "7081:8080"
  eureka:
    mem_limit: "4294967296"
    mem_reservation: "2147483648"
    image: "${JF_CONTAINER_REGISTRY_JADICE}neverpile-eureka-boxed:0.2.7"
    restart: always
    volumes:
      - ./eureka-config:/config
      - ./eureka-data:/data/neverpile-eureka_default
    environment:
      EUREKA_USERNAME: "${EUREKA_USERNAME}"
      EUREKA_PASSWORD: "${EUREKA_PASSWORD}"
    ports:
      - "8085:8080"
    networks:
      - jadice-flow-network
...
JF_CONTAINER_REGISTRY_JADICE: A variable for the container registry to obtain jadice flow container images from. This can be the levigo registry 'registry.jadice.com' or a proxy of it.
The containers will be added to the jadice-flow-network. The main configuration parameters are taken from the predefined system variables set by the start script.
Now we can add additional configuration files to our services. As seen in the docker-compose.yml, the relative paths are mounted directly into the container's /config or /app/config-Directory.
jadice flow controller configuration
(Service name jf-controller in docker-compose.yml)
Create the following files:
- controller-config/application.yml
- controller-config/jobtemplates.yml
- controller-config/workers.yml
These files contain the configuration of this jadice flow installation. They looks like this:
controller-config/application.yml
---
server:
  port: 8080
spring:
  config:
    ## worker and job configuration files:
    import: "/app/config/jobtemplates.yaml,/app/config/workers.yaml"
  datasource:
    url: "jdbc:h2:mem:jadice-flow-db;INIT=CREATE SCHEMA IF NOT EXISTS JADICE_FLOW"
    username: jadice-flow-controller
    password: changemeorkeepmeidontcare
# H2-Console
h2-console-config:
  enabled: true
  port: 8082
# Storage
publisher:
  # required
  internalEndpoint: "http://eureka:8080"
  eureka:
    endpoint: ${EUREKA_ENDPOINT}
    username: ${EUREKA_USERNAME}
    password: ${EUREKA_PASSWORD}
# Jadice flow main config
jadice-flow:
  server-url: http://localhost:8080/
  securityToken: ${JF_ACCESS_TOKEN}
  system:
    lockJobConfiguration: false
    configFileJobs: /app/config/jobtemplates.yaml
jadice:
  license-configuration:
    license: |
      ----BEGIN LICENSE----
      abcdefghijklmnopqrstuvwxyz
      ----END LICENSE----
    fingerprint: 1234567890
    public-key: |
      -----BEGIN PUBLIC KEY-----
      abcdefghijklmnopqrstuvwxyz
      -----END PUBLIC KEY-----
...
Special note to the following settings:
- jadice-flow.securityToken- The access token required to access the flow workers.
- jadice.license-configuration- a jadice license required to start up the controller and run jobs.
The jadice flow controller is using an H2 DB as its runtime database. Other runtime DBs can be configured via
spring.datasource in the application.yml.
The jobtemplates.yml contains the definitions of the workflows. In this tutorial there is only one simple jobTemplate,
with a single step: OCR
controller-config/jobtemplates.yml
jadice-flow.jobs:
  jobTemplates:
    - jobName: "ocr"
      description: "Performs optical character recognition for the given input image(s). Default output is one plain text part and one HOCR part."
      properties: {}
      enabled: true
      stepTemplates:
        - stepName: "OCR"
          workerDefinitionName: "TessOCR"
          inputMimeTypes:
            - "image/png"
            - "application/pdf"
            - "application/octet-stream"
            - "image/jpeg"
            - "image/tiff"
            - "image/bmp"
            - "image/gif"
          expectsNewPartResult: true
          markSrcAsMetaOnResult: true
          parameters:
            - name: "output-formats"
              type: "com.jadice.flow.worker.ocr.OCROutputSetting"
              subTypes: []
              value: "\"TEXT_AND_HOCR\""
              description: "OCR output format(s)"
      jobFlow:
        - from: ""
          "on": "*"
          to: "OCR"
The controller-config/workers.yml contains the definitions of the workers. In our case there is only the worker "TessOCR".
controller-config/workers.yml
jadice-flow.workers:
  workerDefinitions:
  - workerName: "TessOCR"
    description: "Performs optical character recognition on the given image parts\
      \ and stores the result as new part"
    processorClass: "com.jadice.flow.controller.server.processor.impl.TessOCRProcessor"
    workerURL: "http://jadice-flow-worker-tessocr:8080/"
    infoTags:
    - "PART_BASED"
    - "IMAGE_PROCESSING"
    - "REMOTE"
    workerParameters:
    - name: "output-formats"
      type: "com.jadice.flow.worker.ocr.OCROutputSetting"
      subTypes: null
      value: "\"TEXT\""
      description: "OCR output format(s)"
Storage Configuration
(Service names eureka from docker-compose.yml)
The eureka storage only requires setting a username and password. The values are set through docker-compose, but we need to add the placeholders to its application.yaml
Create the following file: eureka-config/application.yml
eureka-config/application.yml
---
server:
  port: 8080
spring:
  application:
    name: neverpile eureka
  security:
    user:
      name: "${EUREKA_USERNAME}"
      password: "${EUREKA_PASSWORD}"
...
Note
The files will not be cleaned by the jadice flow components. It is the responsibility of the integrating application to perform a cleanup of the files.
Usually, the OCR Data is not needed for a long time; a simple mechanism could, for example, delete the eureka-data from within the Start-script before launching.
Instead of eureka you can also use an S3 compatible storage, like Amazon S3 or minio.
Jadice flow OCR worker
Create the file worker-config/application.yml
worker-config/application.yml
---
stage: dev
publisher:
  eureka:
    endpoint: "${EUREKA_ENDPOINT}"
    username: "${EUREKA_USERNAME}"
    password: "${EUREKA_PASSWORD}"
 
spring:
  application:
    name: jadice-flow-worker-ocr
 
opentracing:
  jaeger:
    log-spans: false
    service-name: ${spring.application.name}
    tags:
      stage: ${stage}
 
management:
  endpoint.health.enabled: true
  endpoint.prometheus.enabled: true
  endpoint.info.enabled: true
  endpoints:
    enabled-by-default: false
    web:
      exposure:
        include: "health,prometheus,info"
  metrics:
    enable:
      all: true
  endpoint:
    health:
      show-details: always
 
logging:
  level:
    root: INFO
    com.jadice.flow.worker.tessocr: INFO
...
Special note to the following settings:
- publisher.eureka- The eureka configuration to access the image data and store results
Configuration summary
Finally, we have achieved the following directory structure:
- controller-config/application.yml
- eureka-config/application.yml
- worker-config/application.yml
- .env
- docker-compose.yml
- start-compose.shor- start-compose.cmd
Startup
Switch to the configuration root directory and start the jadice flow instance by simply running your command script
start-compose in a command shell.
You can delete all created resources by running docker compose down.