Update backfill scripts for classifications #40

yuenmichelle1 · 2023-10-17T15:37:30Z

Keeping both old script and new script for record.

Old script (where we use straight up COPY from source and COPY TO target db in one go), ended up taking a long time but also query cancels out and connection to db/s is lost. Because of how COPY works, its hard to debug where connection gets lost and because its an all in situation, we get no data copied. [this is backfill_classifications.py]

To mitigate, we did the following:

split up the connections so that:
1. from the source db, we save it all to csvs containing about 10,000,000 rows.
2. once all csvs are saved, close connection from source db, since we have everything we need from it.
3. connect to target db and copy from all the csvs saved.
4. then close connection from target db
  [This is in ~~backfill_classifications_chunk_in_files.py~~ save_classifications_chunk_in_files.py then copy_classifications_from_files.py ].

Some minor changes:

in backfill_classifications.py, moved the long 🍑 query into its own line.
updated the query to take care of the case where a classification's metadata.user_groups has null in it. (eg. [1234, null, null] ). This has shown up in some classifications while testing.

zwolf

Looks good, though if it were me I'd probably be less inclined to run the export and the import as a single script. Since you're creating the files anyway, you have the opportunity to create them all and confirm success, then be done with that step. Your call of course but in light of previous mystery issues I'd probably break this into two scripts/steps, one to create the files and one to read them.

yuenmichelle1 added 3 commits October 16, 2023 12:48

update classifications backfill script to chunk and save in file

a83d0f0

update to <= remove unused limit in query

16df964

update copy into source

73cfae1

yuenmichelle1 requested a review from zwolf October 17, 2023 15:37

zwolf approved these changes Oct 17, 2023

View reviewed changes

split backfill to file creation then copy from files

f8fa8e3

yuenmichelle1 merged commit 2f8457a into main Oct 17, 2023
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update backfill scripts for classifications #40

Update backfill scripts for classifications #40

yuenmichelle1 commented Oct 17, 2023 •

edited

Loading

zwolf left a comment

Update backfill scripts for classifications #40

Update backfill scripts for classifications #40

Conversation

yuenmichelle1 commented Oct 17, 2023 • edited Loading

zwolf left a comment

Choose a reason for hiding this comment

yuenmichelle1 commented Oct 17, 2023 •

edited

Loading