-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata includes Nextstrain clades in the Nextclade_pango
column
#456
Comments
The workflow blinding appends Nextclade TSV outputs. I'm assuming something changed in the outputs between Nextclade versions, leading to this error. (Will dig into this later...) I'm going to cancel the currently running ncov-ingest jobs, upload the |
Uploaded *.renew files
|
Ah right, open workflow is blocked on #455. Oh how all stars aligned this weekend 😅 At least we'll resolve the issue in GISAID metadata for now. |
Nextclade 3.8.0 added new columns for the new relative mutations feature and changed the order of existing output columns:
The change in ordering is the root cause of this issue, but the new columns would have messed things up too. Maybe to guard against this, we should at least have a check that the new Nextclade outputs columns match the columns of the Nextclade cache. |
Oh no, I always did renew whenever I updated Nextclade datasets, but the software version wasn't on my radar. Thanks for tracking this down! |
That's odd! The order was not supposed to be changed.
Also I need to be more careful when releasing stuff :) |
I don't think this is at all on you @ivan-aksamentov - TSV columns can change, we shouldn't rely on stability and invalidate cache properly. |
Oy, the GISAID ingest is still running after 20+ hours...if it runs past 24 hours, the GH Action workflow will show success/complete, but the job will continue to run on AWS Batch. (I will add final stats.json to #446) |
Ah yeah - we almost never run with the 21L touchfiles, so it's quite some extra work to do. Choosing a larger machine in the future might work if we know there's a pending full rerun. Nextclade parallelizes very well, so can go to a machine with say 64 cores or even larger. |
Oh, I'm not sure the workflow is utilizing this because there are no threads defined for the Snakemake rule. |
No need for threads, nextclade uses all by default. What I meant to say was that we can use more cores to speed things up if we find workflows take too long. See comment over at #459 :) |
Verified the full reruns of Nextclade fixed the metadata for both GISAID and open. |
TODOs
Context
Determined as the root cause for nextstrain/forecasts-ncov#104
Both the private GISAID and public open metadata include Nextstrain clades (e.g.
24A (JN.1)
) in theNextclade_pango
column.The timing of the downstream errors coincides with the latest release of Nextclade datasets and the latest release of Nextclade v3.8.0
The text was updated successfully, but these errors were encountered: