Add code to generate Zarr dataset #274

jeromekelleher · 2024-12-18T12:27:20Z

Once #270 is done, we can add some more steps to the pipeline to create the Zarr dataset.

This will

Download the metadata and description json from Figshare
call ``sc2ts import-alignments dataset.zarr [alignments/*.gz] Add support for reading raw MAFFT alignments sc2ts#460
call sc2ts import-metadata dataset.zarr metadata.tsv.gz --descriptions=field_descriptions.json (requires Add support for "description" for metadata fields sc2ts#459). We may want to do this in Python within snakemake so that we can call the massage_viridian_metadata function.
(Optional, probably doesn't make much difference). Call Dataset.reorder by the Date_tree field.
Call create_zip to create the final version.

With this, we should have a 100% reproducible and automated pipeline for generating the final mafft-aligned Zarr dataset from the primary sources.

The text was updated successfully, but these errors were encountered:

Provide feedback