-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ETL] Issues with monitoring/testing #110
Comments
Joy's Commentthe ACs look like they will be costly to implement since we are leveraging spring batch here and queuing and the tables are managed by it. For AC1 (monitoring), we can rely on logs as source of truth and ignore the DB. |
Himesh's CommentIn general, i agree with the issues that we aim to resolve here.. but have difference in the approach to resolve them though
|
Viveks comment:We are using Quartz btw, not spring batch for ETL.
We perhaps should use new Date here |
… actual job execution and add a higher priority trigger for first run of ETL Sync job for an org
AC1 fixed as per Vivek's input above and seems to work well. AC2 (metabase report) pending. Moving to code review ready so AC1 and AC3 can be tested. |
AC2 Metabase Reports: https://reporting.avniproject.org/question/4841-etl-round-completed-in-90-minutes Alerts can be enabled after this change is promoted due to inaccurate start/end times in scheduled_job_run |
Made slight additions to the first report to filter by "SyncJobs" job_group and show OrgCategory and OrgStatus values in readable format. Code review didn't result in any other issues of concern. |
Additionally create following reports for QA and others to determin ETL job status: |
On debugging issue reported by Achala, observed the following in Prod environment:
Discussion thoughts:
|
This is a good article to understand Quartz misfire scenarios and choice implications.. |
The scenario I am worried about is:
|
Issues raised regarding ETL runs
Discussion notes with recommendation on how to resolve issues raised for ETL run management
Note: We would need to retrigger ETL jobs after repeat frequency config change using Postman Run Collection capability. |
…in past" This reverts commit f50bed2.
Job chaining might also be viable for ETL to guarantee execution for all orgs and avoid misfires. Will require us to 'splice' newly scheduled jobs into the chain. |
As per standup discussion the 2 action points for now are as follows:
|
@himeshr unarchived the card and updated the release to 11.0 |
Info:
|
…re changes" This reverts commit 0fbc2cd.
…tity_id and Rank by desc last_modified_date_time
Issue:
ETL for rwbngos2023 completed in a minute, but in database it looks like it took 15 mins giving a wrong picture
If you see in the below image as well, the start time of some jobs are earlier than the end time of other jobs. So this looks like either the start time of next job or end time of previous job is recorded incorrectly. This is posing issues for monitoring the ETL jobs.
AC:
- When ETL of an organisation of 'Organisation Category' - production or UAT and 'Organisation Status' - Live fails.
- When time taken to complete one round of ETL takes more than 1.5 hours
Technical analysis/suggestions:
Trigger trigger = TriggerBuilder.newTrigger()
.withIdentity("triggerName", "triggerGroup")
.startNow()
.build();
scheduled_job_run
table withqrtz_job_details
Ignore:
What:
Who:
The text was updated successfully, but these errors were encountered: