Skip to main content
Version: Next

Automated Deletion

This tutorial guides you through the process of setting collection states to Active, Closed or MarkedForDeletion, and configuring the grace period and execution interval/time for the deletion job via Helm.

Setting Collection State

To manage the lifecycle of collections, you can set their state to Active, Closed or MarkedForDeletion. The state can be set when a collection is created or updated via the REST APIs.

  • Active: Indicates that the collection is currently active and can be changed.
  • Closed: Indicates that the collection is no longer Active but not yet marked for deletion. Changes for this collection are not allowed via UI, but only via REST APIs.
  • MarkedForDeletion: Indicates that the collection is scheduled for deletion after the grace period. Changes for this collection are not allowed via UI, but only via REST APIs.

When a collection is deleted, only the collection and everything associated with it (bookmarks, annotations, page order, brightness, rotation, historic versions, "seen/unseen" information, etc.) are deleted. The original documents themselves are not deleted.

To automatically clean up documents stored on PVC/filesystem backends, see PVC Document Cleanup below.

Configuring Automated Deletion via Helm

You can configure the grace period and the execution interval/time for the deletion job using Helm. Below is an example Helm configuration:

jobs:
deletion:
enabled: false # Set to true to enable the deletion job
cron: "0 0 0 * * ?" # Default cron expression for nightly deletion at midnight
gracePeriodInDays: 1 # Default grace period of 1 day
transactionBatchSize: 500 # Default batch size for deletion transactions
  • enabled: Enables or disables the deletion job.
  • cron: Specifies the cron expression for scheduling the deletion job. The default value 0 0 0 * * ? schedules the job to run every day at midnight.
  • gracePeriodInDays: Specifies the number of days a collection must be marked for deletion before it is actually deleted. The default value is 1 day.

Example Usage

To enable the deletion job and set a custom schedule, modify the Helm values as needed. For instance, to run the deletion job every Friday at midnight with a grace period of 7 days:

jobs:
deletion:
enabled: true
cron: "0 0 0 * * FRI"
gracePeriodInDays: 7
transactionBatchSize: 500 # Default batch size for deletion transactions

PVC Document Cleanup

Documents uploaded to Dossier Organizer's internal storage (PVC/filesystem) during operations such as flow-based exports may become orphaned over time — for example, if an export job completes or fails without the collection being deleted. These documents are no longer referenced by any collection element but still occupy storage.

The PVC Document Cleanup Job runs on a schedule and removes such orphaned documents once they are older than a configurable retention period.

Note: This job is only relevant when using a filesystem or EhCache backend (i.e. PVC storage). It is not needed for S3-backed deployments, where storage lifecycle policies can be used instead. Which backend is in use is determined by your Storage Configuration, so review that page first — it is a precondition for this job.

Configuration via Helm

jobs:
pvcDocumentCleanup:
enabled: false # Set to true to enable the cleanup job
cron: "0 0 2 * * ?" # Default cron expression — runs nightly at 2am
retentionPeriod: 30d # Documents younger than this are never deleted
skipCollectionCheck: true # Set to false to protect documents still referenced by a collection
  • enabled: Enables or disables the job. Defaults to false.
  • cron: Cron expression controlling when the job runs. The default 0 0 2 * * ? schedules it every night at 2:00 AM.
  • retentionPeriod: Minimum age a document must have before it is eligible for deletion. This safety margin prevents newly uploaded documents from being deleted before they are referenced. Accepts a duration with a unit suffix — d (days), h (hours), m (minutes) or s (seconds) — so sub-day retention such as 12h or 90m is supported. Defaults to 30d.
  • skipCollectionCheck: When true (default), all documents older than the retention period are deleted unconditionally without scanning collections first. Set to false to protect documents that are still referenced by an active collection element.

Performance note: Setting skipCollectionCheck: false makes the job scan every collection and every element to build the set of referenced documents before deleting anything. On deployments with a large number of collections, or with very large collections, this scan can be slow and memory-intensive.

Example

To enable the job with a 14-day retention period, running every Sunday at 3:00 AM:

jobs:
pvcDocumentCleanup:
enabled: true
cron: "0 0 3 * * SUN"
retentionPeriod: 14d

Both jobs can be enabled simultaneously: when a collection is deleted by the Automated Deletion job, its documents become unreferenced and will be picked up by the next PVC Document Cleanup run.