Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phylogenetic updates to match pathogen-repo-guide #238

Merged
merged 16 commits into from
Feb 26, 2024

Conversation

joverlee521
Copy link
Contributor

@joverlee521 joverlee521 commented Feb 15, 2024

Includes major file restructuring within the phylogenetic workflow to match the pathogen-repo-guide:

  • rename workflow/snakemake_rules/ to rules/
  • rename config/ to defaults/
  • separation of custom build-configs
  • split cores.smk into separate snakefiles
  • misc. small updates - see commits for details

I had originally wanted to create a default config YAML as part of this PR. Then remembered this was already done in #173 and changed back to not use a default config in 5019420...

I'll leave this potentially controversial change to another PR. It will most likely involve some changes to the config schema as well as I was finding some potentially confusing uses of the config parameters (e.g. other_filters for non_b1 vs b1).

@joverlee521 joverlee521 mentioned this pull request Feb 16, 2024
4 tasks
Part of work to update this repo to match the pathogen-repo-guide.

Simplify directory structure by putting the core Snakemake rules in
a "rules" directory.
Part of work to update this repo to match the pathogen-repo-guide.

Renamed in preparation for adding the "build-configs" directory in
following commits to hold additional build configurations and
customizations.
Part of work to update this repo to match the pathogen-repo-guide.

Moves the exisitng CI build config and customizations to the new
"build-configs" directory. Includes renaming of the `builds.yaml` to
`config.yaml` since the file no longer clashes with Snakemake Profiles.
Part of work to update this repo to match the pathogen-repo-guide.

Since the chores rules are internal Nextstrain rules, they do not need
to be part of the core workflow. This also resolves the DAG confusion
that Snakemake occassionally runs into when running CI locally.¹

README.md includes the new instructions on how to invoke the workflow to
update the example data. This requires two config files:

1. The CI config to provide all required config params and to ensure
the example data uses correct `strain_id_field` for CI builds.
2. The chores config to include the custom rules

¹ #237
Part of work to update this repo to match the pathogen-repo-guide.
Part of work to update this repo to match the pathogen-repo-guide.

LAPIS was the default data source until we switched to using
data.nextstrain.org.¹ This commit moves the LAPIS related rules to a
custom build-config and is able to successfully override the default
download/decompress rules. However, there are downstream changes to the
workflow that cause errors with the LAPIS data, e.g. hard-coded filters
on columns do not exist in the LAPIS data.²

Rather than spending the time to make sure the workflow runs with the
LAPIS data, I will remove the lapis build-configs in the next commit.
I wanted to do so separately to be able to easily revert the removal
if needed.

¹ 5d0791a
² https://github.com/nextstrain/mpox/blob/34feb4a4e6f4d8b7929cc701734b15b8a5e6a6fc/phylogenetic/workflow/snakemake_rules/core.smk#L43
Removing build-configs/lapis instead of spending the time to make sure
the workflow runs with the LAPIS data.

If others need/want to use the LAPIS data, this custom build config can
be easily restored.
Part of work to update this repo to match the pathogen-repo-guide.

Following commit will move other rules for preparing sequences from
core.smk to prepare_sequences.smk.
Part of work to update this repo to match the pathogen-repo-guide.
Part of work to update this repo to match the pathogen-repo-guide.
Part of work to update this repo to match the pathogen-repo-guide.
Part of work to update this repo to match the pathogen-repo-guide.

The remaining rules in core.smk are all related to the final export
of the Auspice JSONs.
Automated builds have been deployed directly to production since
4017f51.
If `auspice_name` param is not provided, then the workflow will use
the required `build_name` as the default Auspice filename.

Ensures that we are following the Nextstrain Snakemake style guide for
configs¹ by not accessing configs with `config.get(key)`.

¹ https://docs.nextstrain.org/en/latest/reference/snakemake-style-guide.html#access-config-values-appropriately
Follow the Nextstrain Snakemake style guide¹ to use literal path strings
as inputs for the core workflow rules.

This will make it possible for users to override the default workflow
with `custom_rules`.

¹ https://docs.nextstrain.org/en/latest/reference/snakemake-style-guide.html#define-input-paths-with-literal-path-strings
@joverlee521 joverlee521 force-pushed the update-phylo-to-template branch from 38441eb to ce5c2d5 Compare February 16, 2024 19:10
@joverlee521
Copy link
Contributor Author

joverlee521 commented Feb 16, 2024

Rebased to pull in changes from #239.

Running trial builds for

@joverlee521 joverlee521 requested a review from a team February 20, 2024 20:00
Copy link
Contributor

@j23414 j23414 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The move and rename commits to match pathogen-repo-guide look correct to me, so there are no blocking comments from me.

I can't speak to the lapis changes but it does seem like the staged builds completed.

@joverlee521
Copy link
Contributor Author

Thanks for the review @j23414! I'll merge and monitor the automatic builds over week.

If there are issues with the removal of LAPIS, we can always revert the change in the future.

@joverlee521 joverlee521 merged commit e439235 into master Feb 26, 2024
44 checks passed
@joverlee521 joverlee521 deleted the update-phylo-to-template branch February 26, 2024 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants