fix: Metastore sync tables with a single transaction #1482
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We were facing some database perf issues, so I looked into it and the SQL query with the most waits was
INSERT INTO 'data_table_warnings'
:We recently updated our metastore loader code to automatically add warnings to certain tables during the sync. That explains why we're seeing it now, but it didn't seem right so I looked into it.
While most of the metastore sync executes as a single transaction per table, I realized data table warnings are being committed immediately. I enabled database logging in SQLAlchemy and ran through the metastore sync, which helped identify a few other areas where separate transactions were being committed.
This PR fixes every case of a separate transaction in the
base_metastore_loader
's_create_table_table
function that I could detect.commit
arg to functions that were missing it (and passingcommit=False
during the sync)commit
arg should always passcommit=False
to all sub-query functions, since they (optionally) commit at the end of the functionWith these changes, each table creates a single transaction that is committed (or rolled back) at the completion of the
_create_table_table()
function.We deployed this change to production late on July 30th, and the load dropped significantly; all the
row_lock_waits
are gone:Moreover, the metastore loader task runs 100% faster now:
Note: to enable database logging, add
echo=True
to thecreate_engine()
call indb.py
. This let me know exactly when transactions were being committed.