Version: 2.0

Scaling in Output Organizer

Description

Scaling the Output Organizer application for production environments requires careful consideration of the database and storage solutions. The default configuration uses local storage and a local database for an easy first deployment and environment-independent operation. Our recommendation for production is to deploy at least 2 pods of each kind to ensure reliability and scalability. See Sizing article for exact recommendations.

Storage Configuration

If you use the upload functionality of Output Organizer via file browser or drag and drop to import local files you need to change the storage method when scaling the Organizer backend to multiple instances.

For this we currently require an existing storage solution wth an S3 API. If a S3 instance is already available a new Bucket can be created to store Organizer uploads. If you don't have an existing S3 Storage we recommend a Docker version of Minio, for example https://hub.docker.com/r/bitnami/minio.

Configuration Values

For the Organizer to access the storage, a user and bucket has to be created on the storage provider system. After this is done the credentials and connection information can be configured in our Helm chart as follows:

organizer:
  storage:
    s3:
      enabled: true
  secrets:
    s3:
      existingSecret: # Can be used instead of the following fields if an existing should be used, otherwise can be omitted
      bucket: # existing bucket name to store uploads in.
      endpoint: # S3 endpoint for server to server communication
      accessKey: # credential access key
      secretKey: # credential secret key

Database Configuration

To allow Output Organizer to scale horizontally we have to get rid of the local H2 database and allow all instances to access the same data. There are multiple options for databases that are supported like MariaDB, MySQL, PostgreSQL and MSSQL. All these databases are comparable in performance, and you are free to choose any. If there is an existing and supported database available, we recommend to use the existing database for Output Organizer. In case you don't have an existing Database we recommend a Docker version of Mariadb, for example https://hub.docker.com/r/bitnami/mariadb which can be hosted alongside Output Organizer.

The database can be configured using a JDBC URL, a Driver Class, and a DB Dialect. Configuration follows the standard Spring Boot JPA setup. Please refer to the database configuration section in the installation manual.

Configuration Values

In your values.yaml file, you can configure the following properties:

organizer:
  db:
    jdbcURL: ChangeMe # Replace with your database JDBC URL
    driverClassName: ChangeMe # Replace with your database driver class name
    databasePlatform: ChangeMe # Replace with your database dialect

Alternatives

If you are sure that collections are not used over multiple sessions it is also possible to keep the H2 database while scaling up the instance count. Instead, we need to ensure that a single session is always redirected to the same server backend. See Session Stickiness below. This may be easier to set up and requires no additional configuration for the Organizer. But there are significant downsides to this approach:

unable to prepare collections by a server side actor.
impossible to reopen collections after session expires
impossible to save collections for long term use
may cause higher loads on Organizer pods

Session Stickiness

Sticky sessions (also known as Session affinity), ensure that requests from the same user in a Session are always routed to the same server instance.

Session stickiness can be configured either by the used Loadbalancer or Ingress plugin inside the cluster environment. The exact configuration varies depending on the used components and has to be configured accordingly.

Viewer

The Jadice Web Toolkit is used as the main viewer in most installations. For better performance and efficiency the viewer caches document data locally and therefore is recommended to ensure that all traffic regarding the same session ist kept to a single instance.

The kubernetes service name that has to be handled this way is fusion-output-organizer-viewer.

Organizer

Only if the organizer has no external database configured, as described in the alternatives section of the database configuration, sessions stickiness is required for the organizer. This ensures that all requests from a user are using the same database.

The kubernetes service name that has to be handled this way is fusion-output-organizer.

Scaling instances

After all prerequisites are handled we are able to scale instances of all Organizer services. In the Output Organizer Helm chart all subcomponents have a field replicaCount that is used to increase the scaling as needed:

organizer:
  replicaCount:   # number of organizer pods
viewer:
  replicaCount:   # number of viewer pods
controller:
  replicaCount:   # number of controller pods
worker-topdf:
  replicaCount:   # number of export worker pods

Description​

Storage Configuration​

Configuration Values​

Database Configuration​

Configuration Values​

Alternatives​

Session Stickiness​

Viewer​

Organizer​

Scaling instances​

Description

Storage Configuration

Configuration Values

Database Configuration

Configuration Values

Alternatives

Session Stickiness

Viewer

Organizer

Scaling instances