Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Metastore sync tables with a single transaction #1482

Merged

Conversation

baumandm
Copy link
Contributor

@baumandm baumandm commented Aug 2, 2024

We were facing some database perf issues, so I looked into it and the SQL query with the most waits was INSERT INTO 'data_table_warnings':

Screenshot 2024-07-12 at 10 37 24 AM

We recently updated our metastore loader code to automatically add warnings to certain tables during the sync. That explains why we're seeing it now, but it didn't seem right so I looked into it.

While most of the metastore sync executes as a single transaction per table, I realized data table warnings are being committed immediately. I enabled database logging in SQLAlchemy and ran through the metastore sync, which helped identify a few other areas where separate transactions were being committed.

This PR fixes every case of a separate transaction in the base_metastore_loader's _create_table_table function that I could detect.

  1. Where necessary, I added a commit arg to functions that were missing it (and passing commit=False during the sync)
  2. Functions that have a commit arg should always pass commit=False to all sub-query functions, since they (optionally) commit at the end of the function

With these changes, each table creates a single transaction that is committed (or rolled back) at the completion of the _create_table_table() function.

We deployed this change to production late on July 30th, and the load dropped significantly; all the row_lock_waits are gone:

Screenshot 2024-08-02 at 10 33 16 AM

Moreover, the metastore loader task runs 100% faster now:

Screenshot 2024-08-02 at 5 19 24 PM

Note: to enable database logging, add echo=True to the create_engine() call in db.py. This let me know exactly when transactions were being committed.

Copy link
Collaborator

@czgu czgu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@czgu czgu merged commit 8ca6736 into pinterest:master Aug 13, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants