Scaling in Output Organizer
Description
Scaling the Output Organizer application for production environments requires careful consideration of the database and storage solutions. The default configuration uses local storage and a local database for an easy first deployment and environment-independent operation. Our recommendation for production is to deploy at least 2 pods of each kind to ensure reliability and scalability. See Sizing article for exact recommendations.
Storage Configuration
If you use the upload functionality of Output Organizer via file browser or drag and drop to import local files you need to change the storage method when scaling the Organizer backend to multiple instances.
For this we currently require an existing storage solution wth an S3 API. If a S3 instance is already available a new Bucket can be created to store Organizer uploads. If you don't have an existing S3 Storage we recommend a Docker version of Minio, for example https://hub.docker.com/r/bitnami/minio.
Configuration Values
For the Organizer to access the storage, a user and bucket has to be created on the storage provider system. After this is done the credentials and connection information can be configured in our Helm chart as follows:
organizer:
storage:
s3:
enabled: true
secrets:
s3:
existingSecret: # Can be used instead of the following fields if an existing should be used, otherwise can be omitted
bucket: # existing bucket name to store uploads in.
endpoint: # S3 endpoint for server to server communication
accessKey: # credential access key
secretKey: # credential secret key
Database Configuration
To allow Output Organizer to scale horizontally we have to get rid of the local H2 database and allow all instances to access the same data. There are multiple options for databases that are supported like MariaDB, MySQL, PostgreSQL and MSSQL. All these databases are comparable in performance, and you are free to choose any. If there is an existing and supported database available, we recommend to use the existing database for Output Organizer. In case you don't have an existing Database we recommend a Docker version of Mariadb, for example https://hub.docker.com/r/bitnami/mariadb which can be hosted alongside Output Organizer.
The database can be configured using a JDBC URL, a Driver Class, and a DB Dialect. Configuration follows the standard Spring Boot JPA setup. Please refer to the database configuration section in the installation manual.
Configuration Values
In your values.yaml
file, you can configure the following properties:
organizer:
db:
jdbcURL: ChangeMe # Replace with your database JDBC URL
driverClassName: ChangeMe # Replace with your database driver class name
databasePlatform: ChangeMe # Replace with your database dialect
Alternatives
If you are sure that collections are not used over multiple sessions it is also possible to keep the H2 database while scaling up the instance count. Instead, we need to ensure that a single session is always redirected to the same server backend. See Session Stickiness below. This may be easier to set up and requires no additional configuration for the Organizer. But there are significant downsides to this approach:
- unable to prepare collections by a server side actor.
- impossible to reopen collections after session expires
- impossible to save collections for long term use
- may cause higher loads on Organizer pods
Session Stickiness
Sticky sessions (also known as Session affinity), ensure that requests from the same user in a Session are always routed to the same server instance.
Session stickiness can be configured either by the used Loadbalancer or Ingress plugin inside the cluster environment. The exact configuration varies depending on the used components and has to be configured accordingly.
Viewer
The Jadice Web Toolkit is used as the main viewer in most installations. For better performance and efficiency the viewer caches document data locally and therefore is recommended to ensure that all traffic regarding the same session ist kept to a single instance.
The kubernetes service name that has to be handled this way is fusion-output-organizer-viewer
.
Organizer
Only if the organizer has no external database configured, as described in the alternatives section of the database configuration, sessions stickiness is required for the organizer. This ensures that all requests from a user are using the same database.
The kubernetes service name that has to be handled this way is fusion-output-organizer
.
Scaling instances
After all prerequisites are handled we are able to scale instances of all Organizer services.
In the Output Organizer Helm chart all subcomponents have a field replicaCount
that is used to increase the scaling as needed:
organizer:
replicaCount: # number of organizer pods
viewer:
replicaCount: # number of viewer pods
controller:
replicaCount: # number of controller pods
worker-topdf:
replicaCount: # number of export worker pods