Automated Deletion
This tutorial guides you through the process of setting collection states to Active, Closed or MarkedForDeletion, and configuring the grace period and execution interval/time for the deletion job via Helm.
Collection Deletion Job
The Collection Deletion Job removes collections that have been marked for deletion once their grace period has elapsed.
Setting Collection State
To manage the lifecycle of collections, you can set their state to Active, Closed or MarkedForDeletion. The state can be set when a collection is created or updated via the REST APIs.
- Active: Indicates that the collection is currently active and can be changed.
- Closed: Indicates that the collection is no longer Active but not yet marked for deletion. Changes for this collection are not allowed via UI, but only via REST APIs.
- MarkedForDeletion: Indicates that the collection is scheduled for deletion after the grace period. Changes for this collection are not allowed via UI, but only via REST APIs.
When a collection is deleted, only the collection and everything associated with it (bookmarks, annotations, page order, brightness, rotation, historic versions, "seen/unseen" information, etc.) are deleted. The original documents themselves are not deleted.
To automatically clean up documents stored on PVC/filesystem backends, see PVC Document Cleanup below.
Configuration via Helm
You can configure the grace period and the execution interval/time for the deletion job using Helm. Below is an example Helm configuration:
jobs:
deletion:
enabled: false # Set to true to enable the deletion job
cron: "0 0 0 * * ?" # Default cron expression for nightly deletion at midnight
gracePeriodInDays: 1 # Default grace period of 1 day
transactionBatchSize: 500 # Default batch size for deletion transactions
- enabled: Enables or disables the deletion job.
- cron: Specifies the cron expression for scheduling the deletion job. The default value
0 0 0 * * ?schedules the job to run every day at midnight. - gracePeriodInDays: Specifies the number of days a collection must be marked for deletion before it is actually deleted. The default value is 1 day.
Example
To enable the deletion job and set a custom schedule, modify the Helm values as needed. For instance, to run the deletion job every Friday at midnight with a grace period of 7 days:
jobs:
deletion:
enabled: true
cron: "0 0 0 * * FRI"
gracePeriodInDays: 7
transactionBatchSize: 500 # Default batch size for deletion transactions
PVC Document Cleanup
Dossier Organizer's internal storage (PVC/filesystem) holds export results produced by operations such as flow-based exports. These files accumulate over time and keep occupying storage even though they are no longer needed.
The PVC Document Cleanup Job runs on a schedule and removes such stale documents once they are older than a configurable retention period.
Note: This job is only relevant when using a filesystem or EhCache backend (i.e. PVC storage). It is not needed for S3-backed deployments, where storage lifecycle policies can be used instead. Which backend is in use is determined by your Storage Configuration, so review that page first — it is a precondition for this job.
Configuration via Helm
jobs:
pvcDocumentCleanup:
enabled: false # Set to true to enable the cleanup job
cron: "0 0 2 * * ?" # Default cron expression — runs nightly at 2am
retentionPeriod: 30d # Documents younger than this are never deleted
- enabled: Enables or disables the job. Defaults to
false. - cron: Cron expression controlling when the job runs. The default
0 0 2 * * ?schedules it every night at 2:00 AM. - retentionPeriod: Minimum age a document must have before it is eligible for deletion. This safety margin prevents recently produced export results from being deleted while they may still be downloaded. Accepts a duration with a unit suffix —
d(days),h(hours),m(minutes) ors(seconds) — so sub-day retention such as12hor90mis supported. Defaults to30d.
Example
To enable the job with a 14-day retention period, running every Sunday at 3:00 AM:
jobs:
pvcDocumentCleanup:
enabled: true
cron: "0 0 3 * * SUN"
retentionPeriod: 14d