Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop Old Assets Versions fails on large data set #31105

Open
yolabingo opened this issue Jan 10, 2025 · 1 comment
Open

Drop Old Assets Versions fails on large data set #31105

yolabingo opened this issue Jan 10, 2025 · 1 comment

Comments

@yolabingo
Copy link
Contributor

yolabingo commented Jan 10, 2025

Problem Statement

Tools -> Drop Old Assets Versions silently failing on a Cloud env with lots of content

Steps to Reproduce

Observed behavior

Job starts at oldest content date (?)

Every 4 minutes, it increments the date by 30 days

17:42:26.542  INFO  factories.CMSMaintenanceFactory - -> Dropping versions older than '2009-12-17':
17:46:05.670  INFO  factories.CMSMaintenanceFactory - -> No records were found!
17:46:05.670  INFO  factories.CMSMaintenanceFactory -
17:46:05.671  INFO  factories.CMSMaintenanceFactory - -> Dropping versions older than '2010-01-16':
17:49:40.641  INFO  factories.CMSMaintenanceFactory - -> No records were found!
17:49:40.641  INFO  factories.CMSMaintenanceFactory -
17:49:40.642  INFO  factories.CMSMaintenanceFactory - -> Dropping versions older than '2010-02-15':

This runs for about an hour, then the jobs start running about every 15 minutes. After another hour or so, the job appears to stop running, nothing further is logged.

If you re-run the task again, it starts running from the oldest date again. This also appears to resume the previous job where it left off - usually around 2012. Both jobs will run for an hour or two, then stop.

We are effectively unable to delete old assets.

The full impact of this is not known. It does preclude us from being able to use common file transfer tools like rsync and AWS DataSync in many cases.

Acceptance Criteria

This job should succeed.

The heuristic to run in 30-day increments is not a good solution to try to ensure reasonable delete batch sizes and needs to be reconsidered.

dotCMS Version

23.10

Proposed Objective

Core Features

Proposed Priority

Priority 2 - Important

External Links... Slack Conversations, Support Tickets, Figma Designs, etc.

thread https://dotcms.slack.com/archives/C086AG5FJM9/p1736356751816249

@erickgonzalez
Copy link
Contributor

@yolabingo do you think this is a dupe from DropOldContentVersionsJob not running on large dataset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: New
Development

No branches or pull requests

3 participants