-
-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt to create sandbox data release in nightly builds. #3158
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good to me. Left a comment about log handling.
@@ -122,6 +127,8 @@ if [[ $ETL_SUCCESS == 0 ]]; then | |||
if [ $GITHUB_ACTION_TRIGGER = "push" ] || [ $GITHUB_REF = "dev" ]; then | |||
copy_outputs_to_distribution_bucket | |||
ETL_SUCCESS=${PIPESTATUS[0]} | |||
zenodo_data_release 2>&1 | tee -a $LOGFILE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given our current logic, the logs from the release script will show up in log file that is sent to slack but they won't show up in the log file copied to the GCS bucket. I think copy_outputs_to_gcs
should be called towards the end of the script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a problem here now that we're doing some post-processing for distribution -- removing files we don't want to distribute, gzipping the SQLite DBs. What we copy to GCS is for forensic purposes, which is different from what we're currently shipping to AWS, Kaggle, Zenodo, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that's right. I guess it's not a huge deal if we don't include the zenodo release logs in the GCS bucket. We could also change the script to dump the outputs to GCS after the ETL runs then dump the logs after most of the post-processing/distribution logic has happened.
p.s. would love to rework this system so all of the logs are collected for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it'd be better to capture the additional steps in the logs, or only save the outputs we distribute to the nightly build buckets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an overhaul of the nightly build system is definitely on the docket for next year.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand your question:
Do you think it'd be better to capture the additional steps in the logs, or only save the outputs we distribute to the nightly build buckets?
Are you asking which logs should be written to the log file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind I was seeing it as either or based on the ordering, but re-saving the logfile fixes everything.
fi | ||
fi | ||
|
||
# This way we also save the logs from latter steps in the script | ||
gsutil cp $LOGFILE ${PUDL_GCS_OUTPUT} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I added a re-copy of the logfile after everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! This should be merged into main
because we are updating a workflow file.
36706d7
to
081182c
Compare
Overview
Attempt to create a sandbox data release after the nightly builds succeed. This is just meant to test the data release publication process until we switch over to creating real releases from tagged builds.
I think the workflow needs to be added to
main
before it'll work in the scheduled builds though.I added the
ZENODO_SANDBOX_TOKEN_PUBLISH
to the organization secrets.