Local Docker Compose Deployment for OCR
In this tutorial, we will run a jadice flow OCR worker in a local Docker environment via docker compose
.
For example, this setup can be used to handle ad-hoc OCR requests by a jadice web toolkit integration (see jadice web toolkit OCR Addon).
Prerequisites
Docker installation, such as
Packages for other operating systems and instructions on how to install Docker can be found at docker.com.
We use the docker compose
command to run the services later on.
To use the jadice flow components, a security token is needed (JADICE-FLOW-ACCESS-TOKEN
).
You should get a flow access token together with your license. For testing purposes, a test license can also be obtained by sending a request to jadice-support@levigo.de
.
Check your access to https://artifacts.jadice.com/ and https://registry.jadice.com/. Images will be pulled from there. The "start-compose"-scripts we use later perform a docker login to the jadice registry.
How to complete this tutorial
You can start from scratch and create the complete configuration, or you can get the full example code and apply only the necessary changes. The docker type deployment is useful for development or quick test setups. For production use, a container management system like kubernetes is strongly recommended.
To start from scratch, move on to Configuration.
To skip the basics, do the following:
- Download and unzip the source repository for this tutorial, or clone it using Git:
git clone https://github.com/levigo/jadice-flow-getting-started.git
- cd into jadice-flow-getting-started/jadice-flow-tutorial-01/docker-compose/local-ocr/
- in the file
.env
replaceJADICE-FLOW-ACCESS-TOKEN
with your access token
- Jump ahead to Startup.
Configuration
In the following steps we create a configuration to provide the required services:
- jadice flow controller (
jf-controller
) - Storage (
eureka
): The storage is used to store the input images for the OCR operations as well as their results. - A worker configuration, here: jadice flow OCR Worker (
jadice-flow-worker-tessocr
)
Depending on the workers required for the specific tasks, more services may be required.
Creating the configuration folder
It is good practice to start by creating a directory where the configuration files will be placed. For example, on windows the path
could be C:\Docker-Compose\JadiceFlow\local-ocr
.
Common practice is to create a specific folder for each service within the main configuration folder.
For example, controller-config/application.yml
provides the configuration for the jadice flow controller.
In the docker-compose.yml
, this configuration folder is mounted into the container jf-controller
as a docker volume.
In general, all services in docker-compose.yml
can be configured in this manner, as we show in the subsequent paragraphs.
Creating a start script
Create a .env file with the required environment variables and then start the services by calling docker compose up
.
In this example, we must set following variables:
JF_CONTAINER_REGISTRY_JADICE
- Path to the jadice container registry (registry.jadice.com)JF_ACCESS_TOKEN
- An access token used by the controller service for authentication towards the workerEUREKA_USERNAME
- The username for the eureka storageEUREKA_PASSWORD
- The password for the eureka storageCOMPOSE_CONVERT_WINDOWS_PATHS
- for host paths when running with Docker Desktop on Microsoft Windows
Create the file .env
.env
# container registries
JF_CONTAINER_REGISTRY_JADICE=registry.jadice.com/
# controller
JF_ACCESS_TOKEN=THE-[JADICE-FLOW-ACCESS-TOKEN]
# storage: eureka
EUREKA_USERNAME=user
EUREKA_PASSWORD=password
# for running with Docker Desktop on Microsoft Windows
COMPOSE_CONVERT_WINDOWS_PATHS = "1"
Replace [JADICE-FLOW-ACCESS-TOKEN]
with your access token.
Finally, we can start the server by executing a script. During execution a login to the docker registry may be required.
Sample start scripts for Windows and Linux as follows.
Windows
Create the file start-compose.cmd:
start-compose.cmd
@echo off
echo Starting docker compose for jadice flow with OCR
echo Login to levigo container registry
docker login registry.jadice.com
IF NOT EXIST eureka-data mkdir eureka-data
docker compose --env-file .env up
Linux
Create the file start-compose.sh
start-compose.sh
#!/usr/bin/env bash
set -eu pipefail ;
_login_docker_registry(){
echo ">>>[start-compose] login to levigo container registry" ;
docker login registry.jadice.com ;
return 0 ;
} ;
_configure_container_mounts(){
echo ">>>[start-compose] configure container mounts" ;
local _sudo="" ;
local _uid="$(id -u)" ;
if [[ ! "${_uid}" == "0" ]] ; then
_sudo="sudo"
fi ;
${_sudo} mkdir -p ./eureka-data/ ;
${_sudo} chown -R ${_uid}:538446 ./controller-config/ ;
${_sudo} chown -R ${_uid}:538446 ./worker-config/ ;
${_sudo} chown -R ${_uid}:0 ./eureka-config/ ;
${_sudo} chown -R ${_uid}:0 ./eureka-data/ ;
return 0 ;
} ;
_start_docker_compose_stack(){
echo ">>>[start-compose] start docker-compose stack" ;
echo ">>>[start-compose] you can follow the logs with 'docker compose logs -f'" ;
docker compose up -d ;
return 0 ;
} ;
_main() {
_login_docker_registry ;
_configure_container_mounts ;
_start_docker_compose_stack ;
return 0 ;
}
_main ;
Create the 'docker-compose.yml'
The docker-compose.yml
is the Docker-Compose main configuration file. Create this file in the configuration root folder.
Add the following services to this file:
jf-controller
- jadice flow main service- Additionally, a worker is required. In the tutorial example, the
jadice-flow-worker-tessocr
OCR worker is used for OCR.
The service/container names in the docker-compose.yml
can be used for network communication between the containers.
Example docker-compose.yml:
docker-compose.yml
---
version: "2.4"
networks:
jadice-flow-network:
driver: bridge
services:
jf-controller:
mem_limit: "4294967296"
mem_reservation: 2147483648
image: "${JF_CONTAINER_REGISTRY_JADICE}jadice-flow-controller:0.26.5"
user: '538446:538446'
networks:
- jadice-flow-network
restart: always
environment:
JF_ACCESS_TOKEN: ${JF_ACCESS_TOKEN}
EUREKA_ENDPOINT: http://eureka:8080
EUREKA_USERNAME: "${EUREKA_USERNAME}"
EUREKA_PASSWORD: "${EUREKA_PASSWORD}"
volumes:
- ./controller-config:/app/config
ports:
- "8080:8080"
jadice-flow-worker-tessocr:
mem_limit: "8589934592"
mem_reservation: "4294967296"
image: "${JF_CONTAINER_REGISTRY_JADICE}jf-worker-tessocr:1.8.0"
networks:
- jadice-flow-network
restart: always
user: '538446:538446'
environment:
EUREKA_ENDPOINT: http://eureka:8080
EUREKA_USERNAME: "${EUREKA_USERNAME}"
EUREKA_PASSWORD: "${EUREKA_PASSWORD}"
volumes:
- ./worker-config:/app/config
ports:
- "7081:8080"
eureka:
mem_limit: "4294967296"
mem_reservation: "2147483648"
image: "${JF_CONTAINER_REGISTRY_JADICE}neverpile-eureka-boxed:0.2.7"
restart: always
volumes:
- ./eureka-config:/config
- ./eureka-data:/data/neverpile-eureka_default
environment:
EUREKA_USERNAME: "${EUREKA_USERNAME}"
EUREKA_PASSWORD: "${EUREKA_PASSWORD}"
ports:
- "8085:8080"
networks:
- jadice-flow-network
...
JF_CONTAINER_REGISTRY_JADICE
: A variable for the container registry to obtain jadice flow container images from. This can be the levigo registry 'registry.jadice.com' or a proxy of it.
The containers will be added to the jadice-flow-network
. The main configuration parameters are taken from the predefined system variables set by the start script.
Now we can add additional configuration files to our services. As seen in the docker-compose.yml
, the relative paths are mounted directly into the container's /config
or /app/config
-Directory.
jadice flow controller configuration
(Service name jf-controller
in docker-compose.yml)
Create the following files:
- controller-config/application.yml
- controller-config/jobtemplates.yml
- controller-config/workers.yml
These files contain the configuration of this jadice flow installation. They looks like this:
controller-config/application.yml
---
server:
port: 8080
spring:
config:
## worker and job configuration files:
import: "/app/config/jobtemplates.yaml,/app/config/workers.yaml"
datasource:
url: "jdbc:h2:mem:jadice-flow-db;INIT=CREATE SCHEMA IF NOT EXISTS JADICE_FLOW"
username: jadice-flow-controller
password: changemeorkeepmeidontcare
# H2-Console
h2-console-config:
enabled: true
port: 8082
# Storage
publisher:
# required
internalEndpoint: "http://eureka:8080"
eureka:
endpoint: ${EUREKA_ENDPOINT}
username: ${EUREKA_USERNAME}
password: ${EUREKA_PASSWORD}
# Jadice flow main config
jadice-flow:
server-url: http://localhost:8080/
securityToken: ${JF_ACCESS_TOKEN}
system:
lockJobConfiguration: false
configFileJobs: /app/config/jobtemplates.yaml
jadice:
license-configuration:
license: |
----BEGIN LICENSE----
abcdefghijklmnopqrstuvwxyz
----END LICENSE----
fingerprint: 1234567890
public-key: |
-----BEGIN PUBLIC KEY-----
abcdefghijklmnopqrstuvwxyz
-----END PUBLIC KEY-----
...
Special note to the following settings:
jadice-flow.securityToken
- The access token required to access the flow workers.jadice.license-configuration
- a jadice license required to start up the controller and run jobs.
The jadice flow controller is using an H2 DB as its runtime database. Other runtime DBs can be configured via
spring.datasource
in the application.yml
.
The jobtemplates.yml
contains the definitions of the workflows. In this tutorial there is only one simple jobTemplate,
with a single step: OCR
controller-config/jobtemplates.yml
jadice-flow.jobs:
jobTemplates:
- jobName: "ocr"
description: "Performs optical character recognition for the given input image(s). Default output is one plain text part and one HOCR part."
properties: {}
enabled: true
stepTemplates:
- stepName: "OCR"
workerDefinitionName: "TessOCR"
inputMimeTypes:
- "image/png"
- "application/pdf"
- "application/octet-stream"
- "image/jpeg"
- "image/tiff"
- "image/bmp"
- "image/gif"
expectsNewPartResult: true
markSrcAsMetaOnResult: true
parameters:
- name: "output-formats"
type: "com.jadice.flow.worker.ocr.OCROutputSetting"
subTypes: []
value: "\"TEXT_AND_HOCR\""
description: "OCR output format(s)"
jobFlow:
- from: ""
"on": "*"
to: "OCR"
The controller-config/workers.yml
contains the definitions of the workers. In our case there is only the worker "TessOCR".
controller-config/workers.yml
jadice-flow.workers:
workerDefinitions:
- workerName: "TessOCR"
description: "Performs optical character recognition on the given image parts\
\ and stores the result as new part"
processorClass: "com.jadice.flow.controller.server.processor.impl.TessOCRProcessor"
workerURL: "http://jadice-flow-worker-tessocr:8080/"
infoTags:
- "PART_BASED"
- "IMAGE_PROCESSING"
- "REMOTE"
workerParameters:
- name: "output-formats"
type: "com.jadice.flow.worker.ocr.OCROutputSetting"
subTypes: null
value: "\"TEXT\""
description: "OCR output format(s)"
Storage Configuration
(Service names eureka
from docker-compose.yml)
The eureka storage only requires setting a username and password. The values are set through docker-compose, but we need to add the placeholders to its application.yaml
Create the following file: eureka-config/application.yml
eureka-config/application.yml
---
server:
port: 8080
spring:
application:
name: neverpile eureka
security:
user:
name: "${EUREKA_USERNAME}"
password: "${EUREKA_PASSWORD}"
...
Note
The files will not be cleaned by the jadice flow components. It is the responsibility of the integrating application to perform a cleanup of the files.
Usually, the OCR Data is not needed for a long time; a simple mechanism could, for example, delete the eureka-data from within the Start-script before launching.
Instead of eureka you can also use an S3 compatible storage, like Amazon S3 or minio.
Jadice flow OCR worker
Create the file worker-config/application.yml
worker-config/application.yml
---
stage: dev
publisher:
eureka:
endpoint: "${EUREKA_ENDPOINT}"
username: "${EUREKA_USERNAME}"
password: "${EUREKA_PASSWORD}"
spring:
application:
name: jadice-flow-worker-ocr
opentracing:
jaeger:
log-spans: false
service-name: ${spring.application.name}
tags:
stage: ${stage}
management:
endpoint.health.enabled: true
endpoint.prometheus.enabled: true
endpoint.info.enabled: true
endpoints:
enabled-by-default: false
web:
exposure:
include: "health,prometheus,info"
metrics:
enable:
all: true
endpoint:
health:
show-details: always
logging:
level:
root: INFO
com.jadice.flow.worker.tessocr: INFO
...
Special note to the following settings:
publisher.eureka
- The eureka configuration to access the image data and store results
Configuration summary
Finally, we have achieved the following directory structure:
controller-config/application.yml
eureka-config/application.yml
worker-config/application.yml
.env
docker-compose.yml
start-compose.sh
orstart-compose.cmd
Startup
Switch to the configuration root directory and start the jadice flow instance by simply running your command script
start-compose
in a command shell.
You can delete all created resources by running docker compose down
.