History Compaction
Each collection edit (via UI or REST API) triggers the creation of a snapshot, which is stored in the history table. Over time, this can accumulate large amounts of redundant data, especially for frequently updated collections.
The History Compaction Job is a housekeeping component designed to optimize storage by cleaning up outdated history versions of collections in the system.
This job helps maintain a lean database by:
- Retaining only the latest version per day for each collection.
- Allowing configurable retention of complete history for a recent period (e.g., last 30 days).
- Executing in a safe, performant, and transactional manner using batch processing.
Operational Workflow
Once the compaction job is turned on, it runs automatically on a schedule and follows a daily, step-by-step process to remove old versions of collections:
- Start from the earliest saved date: The job checks the oldest date in the history records and begins with collections saved on that day. If the job was started in the past, it will continue from the last processed date.
- Keep only the most recent version per day: For each collection saved on that day, it identifies the last version saved and marks all earlier versions from that same day for deletion.
- Delete outdated versions in batches: To avoid overloading the system, the deletions are done in small batches, helping keep the process efficient and stable.
- Move to the next day: After finishing one day’s cleanup, the date of the next day to be processed gets saved.
- Repeat until done:
This cycle continues until either
- The maximum allowed job duration is reached, or
- The job has processed all data older than the configured retention period (keepDays).
Configuration (Helm Values)
The compaction job is configured under the organizer.jobs.compactHistory
section in your Helm chart:
organizer:
jobs:
compactHistory:
enabled: true # Boolean flag to activate the job (default: false)
cron: "0 0 3 * * *" # Cron expression defining the execution schedule
keepDays: 30 # Number of recent days for which all versions are retained
batchSize: 5000 # Max number of deletions in a single batch transaction
maxDurationInHours: 3 # Maximum duration for the job to run
transactionTimeoutInSeconds: 300 # Timeout per deletion transaction (in seconds)