Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phylogenetic updates to match pathogen-repo-guide #238

Merged
merged 16 commits into from
Feb 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ jobs:
run: |
nextstrain build \
phylogenetic \
--configfile profiles/ci/builds.yaml
--configfiles build-configs/ci/config.yaml
artifact-name: output-${{ matrix.runtime }}
artifact-paths: |
phylogenetic/auspice/
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/rebuild-hmpxv1-big.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -63,5 +63,5 @@ jobs:
--env SLACK_CHANNELS \
. \
notify_on_deploy \
--configfiles $BUILD_DIR/config/$BUILD_NAME/config.yaml $BUILD_DIR/config/nextstrain_automation.yaml \
--configfiles $BUILD_DIR/defaults/$BUILD_NAME/config.yaml $BUILD_DIR/build-configs/nextstrain-automation/config.yaml \
$CONFIG_OVERRIDES --directory $BUILD_DIR --snakefile $BUILD_DIR/Snakefile
2 changes: 1 addition & 1 deletion .github/workflows/rebuild-hmpxv1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -63,5 +63,5 @@ jobs:
--env SLACK_CHANNELS \
. \
notify_on_deploy \
--configfiles $BUILD_DIR/config/$BUILD_NAME/config.yaml $BUILD_DIR/config/nextstrain_automation.yaml \
--configfiles $BUILD_DIR/defaults/$BUILD_NAME/config.yaml $BUILD_DIR/build-configs/nextstrain-automation/config.yaml \
$CONFIG_OVERRIDES --directory $BUILD_DIR --snakefile $BUILD_DIR/Snakefile
2 changes: 1 addition & 1 deletion .github/workflows/rebuild-mpxv.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -63,5 +63,5 @@ jobs:
--env SLACK_CHANNELS \
. \
notify_on_deploy \
--configfiles $BUILD_DIR/config/$BUILD_NAME/config.yaml $BUILD_DIR/config/nextstrain_automation.yaml \
--configfiles $BUILD_DIR/defaults/$BUILD_NAME/config.yaml $BUILD_DIR/build-configs/nextstrain-automation/config.yaml \
$CONFIG_OVERRIDES --directory $BUILD_DIR --snakefile $BUILD_DIR/Snakefile
53 changes: 21 additions & 32 deletions phylogenetic/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ for Nextstrain's suite of software tools.
## Usage

If you're unfamiliar with Nextstrain builds, you may want to follow our
[Running a Pathogen Workflow guide][] first and then come back here.
[Running a Pathogen Workflow guide](https://docs.nextstrain.org/en/latest/tutorials/running-a-workflow.html) first and then come back here.

The easiest way to run this pathogen build is using the Nextstrain
command-line tool from within the `phylogenetic/` directory:
Expand All @@ -28,7 +28,7 @@ Once you've run the build, you can view the results with:
You can run an example build using the example data provided in this repository via:

```
nextstrain build . --configfile profiles/ci/builds.yaml
nextstrain build . --configfile build-configs/ci/config.yaml
```

When the build has finished running, view the output Auspice trees via:
Expand Down Expand Up @@ -61,43 +61,21 @@ nextstrain build . data/sequences.fasta data/metadata.tsv
Run pipeline to produce the "overview" tree for `/mpox/all-clades` with:

```bash
nextstrain build . --configfile config/mpxv/config.yaml
nextstrain build . --configfile defaults/mpxv/config.yaml
```

Run pipeline to produce the "clade IIb" tree for `/mpox/clade-IIb` with:

```bash
nextstrain build . --configfile config/hmpxv1/config.yaml
nextstrain build . --configfile defaults/hmpxv1/config.yaml
```

Run pipeline to produce the "lineage B.1" tree for `/mpox/lineage-B.1` with:

```bash
nextstrain build . --configfile config/hmpxv1_big/config.yaml
nextstrain build . --configfile defaults/hmpxv1_big/config.yaml
```

### Deploy

⚠️ The below is outdated and needs to be adjusted for the new build names (mpox instead of monkeypox, etc.)

<details>

Run the python script [`scripts/deploy.py`](scripts/deploy.py) to deploy the staging build to production.

This will also automatically create a dated build where each node has a unique (random) ID so it can be targeted in shared links/narratives.

```bash
python scripts/deploy.py --build-names hmpxv1 mpxv
```

If a dated build already exists it is not overwritten by default. To overwrite, pass `-f`.

To deploy a locally built build to staging, use the `--staging` flag.

To not deploy a dated build to production, add the `--no-dated` flag.

</details>

### Visualize results

View results with:
Expand All @@ -108,19 +86,30 @@ nextstrain view .

## Configuration

Configuration takes place in `config/*/config.yaml` files for each build.
The analysis pipeline is contained in `workflow/snakemake_rule/core.smk`.
The default configuration takes place in `defaults/*/config.yaml` files for each build.
The analysis pipeline is contained in `rules/core.smk`.
This can be read top-to-bottom, each rule specifies its file inputs and output and pulls its parameters from `config`.
There is little redirection and each rule should be able to be reasoned with on its own.

### Custom build configs

The build-configs directory contains configs and customizations that override and/or extend the default workflow.

- [chores](build-configs/chores/) - internal Nextstrain chores such as [updating the example data](#update-example-data).
- [ci](build-configs/ci/) - CI build that run the [example build](#example-build) with the [example data](example_data/).
- [nextstrain-automation](build-configs/nextstrain-automation/) - internal Nextstrain automated builds

## Update example data

[Example data](./example_data/) is used by [CI](https://github.com/nextstrain/mpox/actions/workflows/ci.yaml). It can also be used as a small subset of real-world data.
[Example data](./example_data/) is used by [CI](https://github.com/nextstrain/mpox/actions/workflows/ci.yaml).
It can also be used as a small subset of real-world data.

Example data should be updated every time metadata schema is changed or a new clade/lineage emerges. To update, run:
Example data should be updated every time metadata schema is changed or a new clade/lineage emerges.
To update, run:

```sh
nextstrain build . update_example_data -F
nextstrain build . update_example_data -F \
--configfiles build-configs/ci/config.yaml build-configs/chores/config.yaml
```

## Data use
Expand Down
28 changes: 7 additions & 21 deletions phylogenetic/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -12,18 +12,16 @@ if version.parse(augur_version) < version.parse(min_augur_version):

if not config:

configfile: "config/hmpxv1/config.yaml"
configfile: "defaults/hmpxv1/config.yaml"


build_dir = "results"


auspice_dir = "auspice"

prefix = config.get("auspice_prefix", None)
AUSPICE_PREFIX = ("trial_" + prefix + "_") if prefix is not None else ""
AUSPICE_FILENAME = AUSPICE_PREFIX + config.get("auspice_name")

# Defaults to the `build_name` if no `auspice_name` is provided in the config
AUSPICE_FILENAME = AUSPICE_PREFIX + config.get("auspice_name", config["build_name"])

rule all:
input:
Expand All @@ -39,22 +37,10 @@ rule all:
"""


if config.get("data_source", None) == "lapis":

include: "workflow/snakemake_rules/download_via_lapis.smk"

else:

include: "workflow/snakemake_rules/prepare.smk"


include: "workflow/snakemake_rules/chores.smk"
include: "workflow/snakemake_rules/core.smk"


if config.get("deploy_url", False):

include: "workflow/snakemake_rules/nextstrain_automation.smk"
include: "rules/prepare_sequences.smk"
include: "rules/construct_phylogeny.smk"
include: "rules/annotate_phylogeny.smk"
include: "rules/export.smk"


# Include custom rules defined in the config.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# I was hoping to use the Snakemake `default_target` directive to make this the
# default target when including this rule via `custom_rules`, but that is
# currently not possible: https://github.com/snakemake/snakemake/issues/2056
rule update_example_data:
"""This updates the files under example_data/ based on latest available data from data.nextstrain.org.

Expand Down
2 changes: 2 additions & 0 deletions phylogenetic/build-configs/chores/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
custom_rules:
- build-configs/chores/chores.smk
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
custom_rules:
- profiles/ci/copy_example_data.smk
- build-configs/ci/copy_example_data.smk

reference: "config/reference.fasta"
genemap: "config/genemap.gff"
genbank_reference: "config/reference.gb"
include: "config/hmpxv1/include.txt"
clades: "config/clades.tsv"
lat_longs: "config/lat_longs.tsv"
auspice_config: "config/hmpxv1/auspice_config.json"
description: "config/description.md"
tree_mask: "config/tree_mask.tsv"
reference: "defaults/reference.fasta"
genemap: "defaults/genemap.gff"
genbank_reference: "defaults/reference.gb"
include: "defaults/hmpxv1/include.txt"
clades: "defaults/clades.tsv"
lat_longs: "defaults/lat_longs.tsv"
auspice_config: "defaults/hmpxv1/auspice_config.json"
description: "defaults/description.md"
tree_mask: "defaults/tree_mask.tsv"

# Use `accession` as the ID column since `strain` currently contains duplicates¹.
# ¹ https://github.com/nextstrain/monkeypox/issues/33
Expand All @@ -20,7 +20,7 @@ build_name: "hmpxv1"
auspice_name: "mpox_clade-IIb"

filter:
exclude: "config/exclude_accessions.txt"
exclude: "defaults/exclude_accessions.txt"
min_date: 2017
min_length: 100000

Expand Down Expand Up @@ -81,4 +81,4 @@ recency: true
mask:
from_beginning: 800
from_end: 6422
maskfile: "config/mask.bed"
maskfile: "defaults/mask.bed"
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Optional configs to include for automated Nextstrain builds
# Intended to be used internally by the Nextstrain team

custom_rules:
- build-configs/nextstrain-automation/nextstrain-automation.smk

# deploy
deploy_url: "s3://nextstrain-data"
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Our bioinformatic processing workflow can be found at [github.com/nextstrain/mpo
- masking several regions of the genome, including the first 1350 and last 6422 base pairs and multiple repetitive regions of variable length
- phylogenetic reconstruction using [IQTREE-2](http://www.iqtree.org/)
- ancestral state reconstruction and temporal inference using [TreeTime](https://github.com/neherlab/treetime)
- clade assignment via [clade definitions defined here](https://github.com/nextstrain/mpox/blob/master/config/clades.tsv), to label broader MPXV clades I, IIa and IIb and to label hMPXV1 lineages A, A.1, A.1.1, etc...
- clade assignment via [clade definitions defined here](https://github.com/nextstrain/mpox/blob/master/defaults/clades.tsv), to label broader MPXV clades I, IIa and IIb and to label hMPXV1 lineages A, A.1, A.1.1, etc...

#### Underlying data
We curate sequence data and metadata from the [NCBI Datasets command line tools](https://www.ncbi.nlm.nih.gov/datasets/docs/v2/download-and-install/),
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
reference: "config/reference.fasta"
genemap: "config/genemap.gff"
genbank_reference: "config/reference.gb"
include: "config/hmpxv1/include.txt"
clades: "config/clades.tsv"
lat_longs: "config/lat_longs.tsv"
auspice_config: "config/hmpxv1/auspice_config.json"
description: "config/description.md"
tree_mask: "config/tree_mask.tsv"
reference: "defaults/reference.fasta"
genemap: "defaults/genemap.gff"
genbank_reference: "defaults/reference.gb"
include: "defaults/hmpxv1/include.txt"
clades: "defaults/clades.tsv"
lat_longs: "defaults/lat_longs.tsv"
auspice_config: "defaults/hmpxv1/auspice_config.json"
description: "defaults/description.md"
tree_mask: "defaults/tree_mask.tsv"

# Use `accession` as the ID column since `strain` currently contains duplicates¹.
# ¹ https://github.com/nextstrain/mpox/issues/33
Expand All @@ -17,7 +17,7 @@ build_name: "hmpxv1"
auspice_name: "mpox_clade-IIb"

filter:
exclude: "config/exclude_accessions.txt"
exclude: "defaults/exclude_accessions.txt"
min_date: 2017
min_length: 100000

Expand Down Expand Up @@ -78,4 +78,4 @@ recency: true
mask:
from_beginning: 800
from_end: 6422
maskfile: "config/mask.bed"
maskfile: "defaults/mask.bed"
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
reference: "config/reference.fasta"
genemap: "config/genemap.gff"
genbank_reference: "config/reference.gb"
include: "config/hmpxv1_big/include.txt"
clades: "config/clades.tsv"
lat_longs: "config/lat_longs.tsv"
auspice_config: "config/hmpxv1_big/auspice_config.json"
description: "config/description.md"
tree_mask: "config/tree_mask.tsv"
reference: "defaults/reference.fasta"
genemap: "defaults/genemap.gff"
genbank_reference: "defaults/reference.gb"
include: "defaults/hmpxv1_big/include.txt"
clades: "defaults/clades.tsv"
lat_longs: "defaults/lat_longs.tsv"
auspice_config: "defaults/hmpxv1_big/auspice_config.json"
description: "defaults/description.md"
tree_mask: "defaults/tree_mask.tsv"

# Use `accession` as the ID column since `strain` currently contains duplicates¹.
# ¹ https://github.com/nextstrain/mpox/issues/33
Expand All @@ -17,7 +17,7 @@ build_name: "hmpxv1_big"
auspice_name: "mpox_lineage-B.1"

filter:
exclude: "config/exclude_accessions.txt"
exclude: "defaults/exclude_accessions.txt"
min_date: 2022
min_length: 180000

Expand Down Expand Up @@ -57,4 +57,4 @@ recency: true
mask:
from_beginning: 800
from_end: 6422
maskfile: "config/mask.bed"
maskfile: "defaults/mask.bed"
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
auspice_config: "config/mpxv/auspice_config.json"
include: "config/mpxv/include.txt"
reference: "config/reference.fasta"
genemap: "config/genemap.gff"
genbank_reference: "config/reference.gb"
lat_longs: "config/lat_longs.tsv"
description: "config/description.md"
clades: "config/clades.tsv"
tree_mask: "config/tree_mask.tsv"
auspice_config: "defaults/mpxv/auspice_config.json"
include: "defaults/mpxv/include.txt"
reference: "defaults/reference.fasta"
genemap: "defaults/genemap.gff"
genbank_reference: "defaults/reference.gb"
lat_longs: "defaults/lat_longs.tsv"
description: "defaults/description.md"
clades: "defaults/clades.tsv"
tree_mask: "defaults/tree_mask.tsv"

# Use `accession` as the ID column since `strain` currently contains duplicates¹.
# ¹ https://github.com/nextstrain/mpox/issues/33
Expand All @@ -17,7 +17,7 @@ build_name: "mpxv"
auspice_name: "mpox_all-clades"

filter:
exclude: "config/exclude_accessions.txt"
exclude: "defaults/exclude_accessions.txt"
min_date: 1950
min_length: 100000

Expand Down Expand Up @@ -74,4 +74,4 @@ recency: true
mask:
from_beginning: 1350
from_end: 6422
maskfile: "config/mask_overview.bed"
maskfile: "defaults/mask_overview.bed"
File renamed without changes.
File renamed without changes.
Loading