-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Abandoned Job Detection and Recovery #30367
Comments
A job will be considered abandoned after 30 min (configurable) |
…dded `JobEvent` interface to various job event classes and implemented validation logic in `ImportContentletsProcessor`. Introduced the `AbandonedJobDetector` class for detecting and handling abandoned jobs.
…s in transaction handling on the JobQueueManagerAPIImpl
This change updates the detectAndMarkAbandoned method to return an Optional<Job> instead of null. This helps to avoid potential NullPointerExceptions and improves code readability. Corresponding updates were made to the affected classes and integration tests to handle the Optional return type appropriately.
…onfig A protected no-arg constructor was added to AbandonedJobDetectorConfig to comply with CDI requirements. This ensures the class can be properly proxied and managed by the CDI container.
Changes applied as part of the feedback:
|
Removed obsolete job events, streamlined job state management by introducing more precise states such as `FAILED_PERMANENTLY` and `ABANDONED_PERMANENTLY`. Replaced job completion terminology and refined method signatures and naming conventions to reinforce consistency. Enhanced Server-Sent Events (SSE) monitoring with a dedicated utility class for improved performance and error handling.
It looks great now! Some potential improvements
JOB_ABANDONMENT_THRESHOLD_MINUTES should always be greater than
|
Parent Issue
#29474
Task
We need to enhance our job queue system to handle abandoned jobs. These are jobs that may have been interrupted due to server crashes, network failures, or other unexpected issues, leaving them in an inconsistent state.
Objective:
Implement mechanisms to detect abandoned jobs and provide recovery strategies to ensure system reliability and data consistency.
Proposed Strategies:
Job Heartbeats:
Timeout Mechanisms:
max_execution_time
field to job configurationsRecovery Procedures:
Additional Considerations:
Proposed Objective
Core Features
Proposed Priority
Priority 2 - Important
Acceptance Criteria
The text was updated successfully, but these errors were encountered: