Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload sequences for all segments to S3 #183

Merged
merged 2 commits into from
Oct 15, 2024
Merged

Conversation

huddlej
Copy link
Contributor

@huddlej huddlej commented Oct 14, 2024

Description of proposed changes

Adds all remaining gene to the list of segments whose sequences should be uploaded to S3.

Related issue(s)

Checklist

Adds all remaining gene to the list of segments whose sequences should
be uploaded to S3.
Since Yam is likely extinct and we've already uploaded its most recent
data to S3 once, we can remove it from the config now. If it returns, we
can add it back.
@huddlej
Copy link
Contributor Author

huddlej commented Oct 14, 2024

The test upload ran without issues for all four lineages and all eight segments. The runtime was 2 hours instead of 45-50 minutes, though. We could reduce the runtime by dropping Yam from the lineages to consider. I'll push a commit next that makes this change.

Copy link
Contributor

@joverlee521 joverlee521 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was slightly worried that the additional segments might overload RethinkDB. However, we are using the Docker runtime within the upload workflow, so we are still limited to just 4 concurrent connections to Fauna.

If we are worried about the additional runtime, we could consider running the additional segments to a separate GH Action workflow that doesn't affect the automated builds.

@huddlej
Copy link
Contributor Author

huddlej commented Oct 14, 2024

If we are worried about the additional runtime, we could consider running the additional segments to a separate GH Action workflow that doesn't affect the automated builds.

That's a good idea. Maybe we can try it this Thursday and see if it is annoyingly slow even without Yam?

This runtime highlights again how incredibly slow the RethinkDB setup is. It shouldn't take hours to download this much data...

@huddlej huddlej merged commit da27f26 into master Oct 15, 2024
3 checks passed
@huddlej huddlej deleted the upload-all-segment-sequences branch October 15, 2024 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants