Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add code to generate Zarr dataset #274

Open
jeromekelleher opened this issue Dec 18, 2024 · 0 comments
Open

Add code to generate Zarr dataset #274

jeromekelleher opened this issue Dec 18, 2024 · 0 comments

Comments

@jeromekelleher
Copy link
Owner

Once #270 is done, we can add some more steps to the pipeline to create the Zarr dataset.

This will

  1. Download the metadata and description json from Figshare
  2. call ``sc2ts import-alignments dataset.zarr [alignments/*.gz] Add support for reading raw MAFFT alignments sc2ts#460
  3. call sc2ts import-metadata dataset.zarr metadata.tsv.gz --descriptions=field_descriptions.json (requires Add support for "description" for metadata fields sc2ts#459). We may want to do this in Python within snakemake so that we can call the massage_viridian_metadata function.
  4. (Optional, probably doesn't make much difference). Call Dataset.reorder by the Date_tree field.
  5. Call create_zip to create the final version.

With this, we should have a 100% reproducible and automated pipeline for generating the final mafft-aligned Zarr dataset from the primary sources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant