You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sc2ts now uses VCF Zarr preprint as its input format and has methods for ingesting data in FASTA and TSV format to create a Zarr zipfile. This whole thing comes to about 300MiB, so it's totally feasible to just deposit to Figshare. (While this is a bit larger than the compressed FASTAs that Viridian ships, it's a lot more accessible, giving fast access to the data in both the sample and variant axes, as well as keeping all the metadata in the same place.)
@iqbal-lab would you be OK with us repackaging the Viridian data like this? It would make sc2ts much more reproducible, as the user could now just download the full dataset in one go and start working immediately. It would also be helpful for me, as I would like to write a case study about the data in the VCF Zarr paper (a whole pandemic worth of data in one file that can be accessed by variant or sample in milliseconds is pretty useful, in my book!).
Things we need:
Fully documented pipeline for mafft alignment (@szhan can you comment here?)
Some "description" fields to accompany the metadata would be very helpful, as it's not entirely obvious what some of the metadata fields mean or where they came from.
The text was updated successfully, but these errors were encountered:
@martinghunt and I are fine with this, our data is all open and released. I think in terms of metadata, is there any chance you could dump a list of what metadata you are asking for? I think it is all from the ENA, apart from date_tree which is the result of an algorithm comparing dates with other sources (Genbank, gisaid). Not sure where the ENA metadata is define
@martinghunt and I are fine with this, our data is all open and released.
Fantastic. We'll be very careful to correctly attribute the origins and make it clear we're just repackaging. I'll ping you both when there's a figshare link to look at to make sure everyone is happy.
Sc2ts now uses VCF Zarr preprint as its input format and has methods for ingesting data in FASTA and TSV format to create a Zarr zipfile. This whole thing comes to about 300MiB, so it's totally feasible to just deposit to Figshare. (While this is a bit larger than the compressed FASTAs that Viridian ships, it's a lot more accessible, giving fast access to the data in both the sample and variant axes, as well as keeping all the metadata in the same place.)
@iqbal-lab would you be OK with us repackaging the Viridian data like this? It would make sc2ts much more reproducible, as the user could now just download the full dataset in one go and start working immediately. It would also be helpful for me, as I would like to write a case study about the data in the VCF Zarr paper (a whole pandemic worth of data in one file that can be accessed by variant or sample in milliseconds is pretty useful, in my book!).
Things we need:
The text was updated successfully, but these errors were encountered: