Garbage collecting completed jobs #82

gregwebs · 2023-10-02T17:30:07Z

What is an appropriate strategy for garbage collecting old jobs?

gue deletes on completion and suggests inserting into a new table. Gue suggests doing this in a hook, but doesn't explain how to handle the edge case of a hook panic (job will get automatically retried).

I like the concept of moving data into an archive table (as suggested in the Gue README) but having the framework manage that to ensure error conditions are handled well.

acaloiaro · 2023-10-02T17:42:28Z

I think you're right that GC is best left to the library. This request has been sitting in my mental queue for a while.

How I'd like to proceed is:

Be opinionated: make it "the neoq way" that jobs get moved to neoq_jobs_complete by default.
Provide a configuration option that allows jobs to simply be purged instead of moved to neoq_jobs_complete

Thoughts?

gregwebs · 2023-10-02T18:30:18Z

That sound perfect to me. Should it be neoq_jobs_completed past tense?

For actual GC of the completed table, I think it is fine to leave that to the user for now as an out of band, and some users might want to keep it that way. For example if delete permission was not be available to neoq, then actual GC would fail and you would be back to some kind of error handling callback. It is also possible to provide an example neoq periodic job that does the completed GC. If that job were to fail then job failure monitoring would kick in.

It seems needed for backwards compatibility to have an option to maintain the current behavior. I can see how there might be a use case where the existing behavior is preferred because everything is in just one table. Although that seems limited if the tables can be combined with UNION.

acaloiaro · 2023-10-02T18:49:30Z

That sound perfect to me. Should it be neoq_jobs_completed past tense?

Yep!

For actual GC of the completed table, I think it is fine to leave that to the user for now as an out of band, and some users might want to keep it that way. For example if delete permission was not be available to neoq, then actual GC would fail and you would be back to some kind of error handling callback. It is also possible to provide an example neoq periodic job that does the completed GC. If that job were to fail then job failure monitoring would kick in.

I'd like to give this more thought, but I what I feel most inclined to do here is provide a WithJobArchive config option which would accept options such as MOVE, KEEP, DESTROY.

This behavior would likely live outside the job's transaction, but dependent upon its success. The reason being: if the job succeeds and its status is updated, we wouldn't want archival to affect the job's success. Failure to archive is a neoq problem; not the application embedding it.

It seems needed for backwards compatibility to have an option to maintain the current behavior. I can see how there might be a use case where the existing behavior is preferred because everything is in just one table. Although that seems limited if the tables can be combined with UNION.

I don't feel a strong obligation relative to neoq behavior until it hits 1.0. I have some obligation relative to API stability at this point (I guaranteed no changes to the neoq.Neoq API before 1.0). Currently, my thinking about when it'll reach 1.0 is in another issue. That writeup probably needs to live somewhere else; perhaps in the Status section of the README. I'll give this some thought as well.

gregwebs · 2023-10-02T19:02:53Z

This behavior would likely live outside the job's transaction, but dependent upon its success. The reason being: if the job succeeds and its status is updated, we wouldn't want archival to affect the job's success. Failure to archive is a neoq problem; not the application embedding it.

I like that. At some point it is a user problem though, so the mechanisms for reporting the problem to the user need to be well defined. Logging at error level is a standard fallback of course. But I use Sentry for error monitoring and it would be nice to have a hook to send such errors there, although I could make it with a user-defined periodic job that would query for both jobs in the completed state and jobs not archived (at some point these errors fall back to logging if reporting to Sentry doesn't work).

acaloiaro · 2023-10-02T19:05:34Z

Yeah I think a well-structured, error-level log is the bare minimum here. I'm hesitant to provide a use-case-specific hook for this, but conceptually, I think a hook is the best way to truly handle failure.

That brings us to a much broader topic of generalized hooks, which I'm personally not prepared to tackle without some outside help :)

acaloiaro · 2024-01-28T14:22:15Z

Pulling this out of "In Progress", as I think better telemetry and/or event "hooks" should come first.

acaloiaro added the enhancement New feature or request label Oct 6, 2023

acaloiaro added the in progress Work is in progress for an issue label Dec 28, 2023

acaloiaro removed the in progress Work is in progress for an issue label Jan 28, 2024

acaloiaro changed the title ~~Garbage collecting completed jobs?~~ Garbage collecting completed jobs Jan 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Garbage collecting completed jobs #82

Garbage collecting completed jobs #82

gregwebs commented Oct 2, 2023

acaloiaro commented Oct 2, 2023

gregwebs commented Oct 2, 2023

acaloiaro commented Oct 2, 2023 •

edited

Loading

gregwebs commented Oct 2, 2023

acaloiaro commented Oct 2, 2023 •

edited

Loading

acaloiaro commented Jan 28, 2024

Garbage collecting completed jobs #82

Garbage collecting completed jobs #82

Comments

gregwebs commented Oct 2, 2023

acaloiaro commented Oct 2, 2023

gregwebs commented Oct 2, 2023

acaloiaro commented Oct 2, 2023 • edited Loading

gregwebs commented Oct 2, 2023

acaloiaro commented Oct 2, 2023 • edited Loading

acaloiaro commented Jan 28, 2024

acaloiaro commented Oct 2, 2023 •

edited

Loading

acaloiaro commented Oct 2, 2023 •

edited

Loading