Automated Deletion
This tutorial guides you through the process of setting collection states to Active, Closed or MarkedForDeletion, and configuring the grace period and execution interval/time for the deletion job via Helm.
Setting Collection State
To manage the lifecycle of collections, you can set their state to Active, Closed or MarkedForDeletion. The state can be set when a collection is created or updated via the REST APIs.
- Active: Indicates that the collection is currently active and can be changed.
- Closed: Indicates that the collection is no longer Active but not yet marked for deletion. Changes for this collection are not allowed via UI, but only via REST APIs.
- MarkedForDeletion: Indicates that the collection is scheduled for deletion after the grace period. Changes for this collection are not allowed via UI, but only via REST APIs.
When a collection is deleted, only the collection and everything stored within it (annotations, page order, rotation, etc.) are deleted. Internal documents (user uploaded documents) are also deleted while documents stored in external sources will persist.
Deletion Process
The automated deletion process follows a simple approach:
-
Internal Document Cleanup:
- All internal documents (stored in S3) are deleted
- If ANY document deletion fails → entire collection deletion is aborted
- Collection remains in database
-
Database Deletion:
- Only if all prior steps succeed → collection is deleted from database
- Collection history is cleaned up
- Collection seen/unseen states are cleaned up
Document Cleanup
Output Organizer (OO) and jadice flow may create temporary documents during processing and export operations. These include:
- Uploaded documents in OO
- Exported PDFs and other temporary files generated by jadice flow
Cleaning Up Temporary Documents
- Output Organizer: Documents uploaded to OO and belonging to collections marked for deletion are automatically cleaned up according to the configured grace period and deletion job settings (see below).
- jadice flow: jadice flow provides its own automatic job deletion mechanism, which can be configured to remove jobs and their associated temporary files after a certain period. You can adjust the schedule, age threshold, and deletion mode via jadice flow’s configuration. For more details, see the jadice flow Job Deletion documentation.
- You can also trigger job deletion manually via REST:
curl -X POST --location "http://jf-controller.acme.com/admin/trigger-job-deletion" -H "Content-Type: application/json" -d 'true'
- You can also trigger job deletion manually via REST:
- External Storage (e.g., S3): In addition to Output Organizer's cleanup, you may use your storage vendor’s lifecycle management features to automatically remove old or temporary files. Both approaches are valid and can be used in parallel or as preferred.
Automatic Deletion After Export
After exporting a collection, it can be automatically marked for deletion. This ensures that the collection is scheduled for deletion after the specified grace period, reducing the need for manual intervention.
Configuring Automated Deletion via Helm
You can configure the grace period and the execution interval/time for the deletion job using Helm. Below is an example Helm configuration:
organizer:
jobs:
deletion:
enabled: false # Set to true to enable the deletion job
cron: "0 0 0 * * ?" # Default cron expression for nightly deletion at midnight
gracePeriodInDays: 5 # Default grace period of 5 days
autoMarkForDeletion: false # Set to true to automatically mark a collection and documents (related to that collection) for deletion after export.
transactionBatchSize: 500 # Default batch size for deletion transactions
Deletion Job Parameters
- enabled: Enables or disables the deletion job.
- cron: Specifies the cron expression for scheduling the deletion job. The default value
0 0 0 * * ?schedules the job to run every day at midnight.- Examples:
0 */5 * * * ?- Every 5 minutes0 0 0 * * ?- Daily at midnight0 0 0 * * MON- Every Monday at midnight
- Examples:
- gracePeriodInDays: Specifies the number of days a collection must be marked for deletion before it is actually deleted. The default value is 30 days.
- autoMarkForDeletion: Automatically marks a collection for deletion after export.
- transactionBatchSize: Default batch size for deletion transactions.
Only internal documents that belong to collections marked for deletion are also deleted, while documents stored in external sources will persist.
Example Usage
Example 1: Basic Deletion Job Configuration
To enable the deletion job and set a custom schedule, modify the Helm values as needed. For instance, to run the deletion job every Friday at midnight with a grace period of 7 days:
organizer:
jobs:
deletion:
enabled: true
cron: "0 0 0 * * FRI" # Every Friday at midnight
gracePeriodInDays: 7
autoMarkForDeletion: true