OCR Conversion
What is the purpose of this Helm chart
This product is an example for a jadice flow product. It provides a jadice flow stack for OCR (Optical Character Recognition).
Supported Formats
The included worker-tessocr can perform OCR on the following input formats:
- TIFF, JPEG, GIF, PNG, and BMP image formats
- Multi-page TIFF images
- PDF document format
Other formats must be converted first. This can be done in a jadice flow jobtemplate. For more information see worker-tessocr.
Where to get
After you received the credentials (technical user) for accessing our Helm Chart and Container Registry, you can log in with these credentials at https://artifacts.jadice.com and change your password.
To add a Helm repository to your repositories you can then execute the following commands:
$ helm repo add levigo https://artifacts.jadice.com/repository/helm-charts/ --username <username>
Password: <enter your password>
When you see the password prompt please enter your password.
You now have access to the Helm chart jf-ocr in your helm cli.
There is also the option to download the Helm Chart with the command:
$ helm pull levigo/jf-ocr
TL;DR
You need to provide some Required Parameters. These are:
- the secrets.controller.accessToken, used by the jf-controller for authentication to the workers
- the storage credentials - e.g. S3
- the container registry credentials for the jadice flow images (usually the same as for the Helm repository, or your own private registry)
The credentials for the controller database - by default H2 - can be changed, too.
A minimal values.yaml with these required parameters would look something like this:
secrets:
  useSealedSecrets: false
  controller:
    accessToken: MY-ACCESS-TOKEN
    database:
      username: myUsername
      password: myPassword
  s3:
    bucket: jadice-flow-bucket
    endpoint: s3.acme.com
    accessKey: myAccessKey
    secretKey: mySecretKey
## uncomment when using WebDAV and remove S3 block above
#  webdav:
#    endpoint: https://webdav.acme.com/
#    username: myUser
#    password: myPassword
#    outputPath: results
## values you'll probably need, because jadice images are only available in a private registry
jadiceFlow:
  image:
    ## path to proxy registry of "registry.jadice.com"
    ## (required if direct access to "registry.jadice.com" is not possible)
    registry: jadice.proxy.registry.acme.com
    ## pull secret for proxy registry or "registry.jadice.com"
    pullSecrets:
      - "my-pull-secret"
To install the chart from the levigo helm repository with the release name my-release:
$ helm repo add levigo https://artifacts.jadice.com/repository/helm-charts/ --username <username>
Password: <enter your password>
$ helm install --values values.yaml my-release levigo/jf-ocr
Installation and Configuration
Prerequisites
Kubernetes
- Kubernetes 1.14+
- Helm 3.1.0+
- an Ingress Controller
Storage
- The default storage for storing job results is Eureka. It requires zero configuration for test installations. It's also possible to enabling persistence and configuring other storage backends.
Container Image Access
Because the images used in this chart are from a private container registry you need to have
- access to the container registry registry.jadice.comOR a proxy of it
API Access Token
- a jadice flow access token
Installing the Chart
To install the chart with the release name my-release:
$ helm repo add levigo https://artifacts.jadice.com/repository/helm-charts/ --username <username>
Password: <enter your password>
$ helm install --values values.yaml my-release levigo/jf-ocr
The command deploys jf-ocr on the Kubernetes cluster in the default configuration. The configuration section lists the parameters that can be configured during installation.
Uninstalling the Chart
To uninstall the my-release deployment:
$ helm uninstall my-release
The command removes all the Kubernetes components associated with the chart and deletes the release.
Check that every pod is running and ready
You can check the status of the deployment by checking if all the pods are up and running:
kubectl get pods
Check functionality with Postman
To validate your deployment you can use the Postman
collection jadice-flow tutorial ocr.postman_collection.json that is part of this Helm chart.
You can get it by pulling the Chart with ''helm pull levigo/jf-ocr'. For instructions
on how to import a collection see Importing data into Postman.
After you have imported the collection you need to create a variable JF_CONTROLLER_URL pointing to your jf-controller
(e.g. https://jf-controller.email.acme.com) and add your access token as a Bearer Token to the collection.
Information about the jadice flow REST-API can be found here.
Troubleshooting
If one or more of the pods stay pending, it could mean that it can not be scheduled onto a node. This can happen if there are not enough resources available. You can check for messages from the scheduler with:
kubectl describe pods <POD_NAME>
If the output of the described command does not provide enough information, you can tail the logs of a pod with this command:
Included Workers
More Information
For more information you check out the Readme.md included in the Helm Chart. You can find the Readme.md in the tgz file that can be downloaded with the command:
helm pull levigo/jf-ocr