-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix H3 vaccine strains #174
Comments
This has proven to be more complicated than I initially thought... I wanted to use This duplicate location label is already an issue for geolocation assignments so I thought I would tackle it here as well. I wanted to use the GISAID provided location metadata to make sure the label and the country matched. Then I realized the virus strain name and the sequence strain name are curated separately and the sequence data do not include the GISAID location metadata. I can create a map of the virus strain name to the GISAID EPI ISL that can be used to match the sequence strain name instead of curating them separately. However, this idea just does not work for the strain names in the titer data because they do not have location metadata. Titer data mostly do not include the GISAID EPI ISL so I cannot think of how to reliably match titer strain name to the sequence strain names... Even if I can tackle all of the above, this will only standardize strain names for new uploads. If All that is to say this is taking longer than I would like for fixing this specific data issue, so I plan to do some short term fixes before tackling the larger issue of standardizing locations in strain names.
|
Fixes the specific strain name issues raised in <#174> for future uploads. Records already in the database need to be manually deleted and re-uploaded.
Manually cleaned up sequence and titer data in fauna and running uploads to S3 in seasonal-flu. |
As part of clean up in nextstrain/fauna#174, the strain name "A/DistrictofColumbia/27/2023" has been updated to "A/DistrictOfColumbia/27/2023" to follow our standard capitalization of DC in strain names. Removed the now duplicate entries from references_for_titer_plots.
Details on Slack.
TODOs
General improvements:
DistrictofColumnbia
->DistrictOfColumbia
)--prioritized_seqs_file
option #176--prioritized_seqs_file
option seasonal-flu#203Data clean up:
A/Croatia/10136/RV/2023
sequences since they are duplicates ofA/Croatia/10136RV/2023
A/DistrictofColumbia/27/2023
sequences and reupload so they haveA/DistrictOfColumbia/27/2023
strain namesA/Croatia/10136RV/2023
andA/DistrictOfColumbia/27/2023
Post clean up:
The text was updated successfully, but these errors were encountered: