-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refacto dataset version query #2683
refacto dataset version query #2683
Conversation
✅ Deploy Preview for peppy-sprite-186812 canceled.
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2683 +/- ##
=========================================
Coverage 84.05% 84.05%
Complexity 1379 1379
=========================================
Files 248 248
Lines 6297 6297
Branches 286 286
=========================================
Hits 5293 5293
Misses 851 851
Partials 153 153 ☔ View full report in Codecov by Sentry. |
659040f
to
9503c89
Compare
Signed-off-by: sophiely <[email protected]>
Signed-off-by: sophiely <[email protected]>
9503c89
to
8230d17
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
Great job! Congrats on your first merged pull request in the Marquez project! |
Problem
The SQL query run for dataset version is quite slow and often result in a time out.
Closes: 2684
Solution
In this query the JSONB_AGG operation is the task that takes the most time, so I just re-write the query by putting the "heavy operation" (JSONB_AGG) after all JOIN and filters.
For a given namespace and dataset name and a db.t4g.medium (vCPU: 2, RAM: 4 GB) machine:
Checklist
CHANGELOG.md
(Depending on the change, this may not be necessary)..sql
database schema migration according to Flyway's naming convention (if relevant)