Skip to content

Commit

Permalink
fixup: update dataset to incorporate fixes
Browse files Browse the repository at this point in the history
* Use reconstructed roots for serotype-level and genotype-level datasets
* Update the all dataset with root and gap penalty
* Update the dengue/all dataset README.md file
  • Loading branch information
j23414 committed Jun 5, 2024
1 parent 4994e89 commit 212f411
Show file tree
Hide file tree
Showing 20 changed files with 135 additions and 803 deletions.
27 changes: 23 additions & 4 deletions nextclade/datasets/all/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,26 @@
# Nextclade dataset for "Dengue Virus"
# De dataset

## Dataset attributes
| Key | Value |
| :-- | :-- |
| name | Dengue (serotype-level) |
| authors | [Nextstrain](https://nextstrain.org) |
| reference | NC_002640.1 |
| workflow | https://github.com/nextstrain/dengue/tree/main/nextclade |
| path | `nextstrain/dengue/all` |

Nextclade dataset

Read more about Nextclade datasets in Nextclade documentation: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html
## Scope of this dataset

This dataset assigns serotype to dengue samples based on [criteria outlined by the WHO](https://pubmed.ncbi.nlm.nih.gov/26868382/) and tree placement nearest references [NC_001477.1 (DENV1)](https://www.ncbi.nlm.nih.gov/nuccore/NC_001477.1), [NC_001474.2 (DENV2)](https://www.ncbi.nlm.nih.gov/nuccore/NC_001474.2), [NC_001475.2 (DENV3)](https://www.ncbi.nlm.nih.gov/nuccore/NC_001475.2), and [NC_002640.1 (DENV4)](https://www.ncbi.nlm.nih.gov/nuccore/NC_002640.1).

## Features

This dataset supports:

- Assignment of serotypes
- Phylogenetic placement
- Sequence quality control (QC)

## What are Nextclade datasets

Read more about Nextclade datasets in the Nextclade documentation: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html
26 changes: 13 additions & 13 deletions nextclade/datasets/all/genome_annotation.gff3
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
##gff-version 3
##sequence-region NC_002640.1 1 10649
NC_002640.1 feature gene 102 440 . + . codon_start=1;gene=C;gene_name=C;
NC_002640.1 feature gene 441 713 . + . codon_start=1;gene=pr;gene_name=pr;
NC_002640.1 feature gene 441 938 . + . codon_start=1;gene=M;gene_name=M;
NC_002640.1 feature gene 939 2423 . + . codon_start=1;gene=E;gene_name=E;
NC_002640.1 feature gene 2424 3479 . + . codon_start=1;gene=NS1;gene_name=NS1;
NC_002640.1 feature gene 3480 4133 . + . codon_start=1;gene=NS2A;gene_name=NS2A;
NC_002640.1 feature gene 4134 4523 . + . codon_start=1;gene=NS2B;gene_name=NS2B;
NC_002640.1 feature gene 4524 6377 . + . codon_start=1;gene=NS3;gene_name=NS3;
NC_002640.1 feature gene 6378 6758 . + . codon_start=1;gene=NS4A;gene_name=NS4A;
NC_002640.1 feature gene 6759 6827 . + . codon_start=1;gene=2K;gene_name=2K;
NC_002640.1 feature gene 6828 7562 . + . codon_start=1;gene=NS4B;gene_name=NS4B;
NC_002640.1 feature gene 7563 10262 . + . codon_start=1;gene=NS5;gene_name=NS5;
##sequence-region Reconstructed_root_sequence_of_https_nextstrain_org_dengue/all/genome 1 10649
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/all/genome feature gene 102 440 . + . codon_start=1;gene=C;gene_name=C;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/all/genome feature gene 441 713 . + . codon_start=1;gene=pr;gene_name=pr;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/all/genome feature gene 441 938 . + . codon_start=1;gene=M;gene_name=M;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/all/genome feature gene 939 2423 . + . codon_start=1;gene=E;gene_name=E;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/all/genome feature gene 2424 3479 . + . codon_start=1;gene=NS1;gene_name=NS1;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/all/genome feature gene 3480 4133 . + . codon_start=1;gene=NS2A;gene_name=NS2A;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/all/genome feature gene 4134 4523 . + . codon_start=1;gene=NS2B;gene_name=NS2B;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/all/genome feature gene 4524 6377 . + . codon_start=1;gene=NS3;gene_name=NS3;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/all/genome feature gene 6378 6758 . + . codon_start=1;gene=NS4A;gene_name=NS4A;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/all/genome feature gene 6759 6827 . + . codon_start=1;gene=2K;gene_name=2K;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/all/genome feature gene 6828 7562 . + . codon_start=1;gene=NS4B;gene_name=NS4B;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/all/genome feature gene 7563 10262 . + . codon_start=1;gene=NS5;gene_name=NS5;
8 changes: 6 additions & 2 deletions nextclade/datasets/all/pathogen.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
{
"alignmentParams": {
"penaltyGapOpen": 8,
"penaltyGapOpenInFrame": 12,
"penaltyGapOpenOutOfFrame": 14,
"gapAlignmentSide": "left",
"minSeedCover": 0.01,
"minLength": 1000
},
Expand Down Expand Up @@ -30,7 +34,7 @@
},
"qc": {
"frameShifts": {
"enabled": false
"enabled": true
},
"missingData": {
"enabled": false,
Expand All @@ -56,7 +60,7 @@
"windowSize": 100
},
"stopCodons": {
"enabled": false
"enabled": true
}
},
"schemaVersion": "3.0.0",
Expand Down
2 changes: 1 addition & 1 deletion nextclade/datasets/all/tree.json

Large diffs are not rendered by default.

26 changes: 13 additions & 13 deletions nextclade/datasets/denv1/genome_annotation.gff3
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
##gff-version 3
##sequence-region NC_001477.1 1 10735
NC_001477.1 feature gene 95 436 . + . codon_start=1;gene=C;gene_name=C;
NC_001477.1 feature gene 437 709 . + . codon_start=1;gene=pr;gene_name=pr;
NC_001477.1 feature gene 437 934 . + . codon_start=1;gene=M;gene_name=M;
NC_001477.1 feature gene 935 2419 . + . codon_start=1;gene=E;gene_name=E;
NC_001477.1 feature gene 2420 3475 . + . codon_start=1;gene=NS1;gene_name=NS1;
NC_001477.1 feature gene 3476 4129 . + . codon_start=1;gene=NS2A;gene_name=NS2A;
NC_001477.1 feature gene 4130 4519 . + . codon_start=1;gene=NS2B;gene_name=NS2B;
NC_001477.1 feature gene 4520 6376 . + . codon_start=1;gene=NS3;gene_name=NS3;
NC_001477.1 feature gene 6377 6757 . + . codon_start=1;gene=NS4A;gene_name=NS4A;
NC_001477.1 feature gene 6758 6826 . + . codon_start=1;gene=2K;gene_name=2K;
NC_001477.1 feature gene 6827 7573 . + . codon_start=1;gene=NS4B;gene_name=NS4B;
NC_001477.1 feature gene 7574 10270 . + . codon_start=1;gene=NS5;gene_name=NS5;
##sequence-region Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv1/genome 1 10735
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv1/genome feature gene 95 436 . + . codon_start=1;gene=C;gene_name=C;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv1/genome feature gene 437 709 . + . codon_start=1;gene=pr;gene_name=pr;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv1/genome feature gene 437 934 . + . codon_start=1;gene=M;gene_name=M;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv1/genome feature gene 935 2419 . + . codon_start=1;gene=E;gene_name=E;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv1/genome feature gene 2420 3475 . + . codon_start=1;gene=NS1;gene_name=NS1;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv1/genome feature gene 3476 4129 . + . codon_start=1;gene=NS2A;gene_name=NS2A;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv1/genome feature gene 4130 4519 . + . codon_start=1;gene=NS2B;gene_name=NS2B;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv1/genome feature gene 4520 6376 . + . codon_start=1;gene=NS3;gene_name=NS3;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv1/genome feature gene 6377 6757 . + . codon_start=1;gene=NS4A;gene_name=NS4A;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv1/genome feature gene 6758 6826 . + . codon_start=1;gene=2K;gene_name=2K;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv1/genome feature gene 6827 7573 . + . codon_start=1;gene=NS4B;gene_name=NS4B;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv1/genome feature gene 7574 10270 . + . codon_start=1;gene=NS5;gene_name=NS5;
9 changes: 7 additions & 2 deletions nextclade/datasets/denv1/pathogen.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
{
"alignmentParams": {
"penaltyGapOpen": 8,
"penaltyGapOpenInFrame": 12,
"penaltyGapOpenOutOfFrame": 14,
"gapAlignmentSide": "left",
"minSeedCover": 0.1,
"minLength": 1000
},
Expand All @@ -17,6 +21,7 @@
"experimental": true,
"files": {
"changelog": "CHANGELOG.md",
"examples": "sequences.fasta",
"genomeAnnotation": "genome_annotation.gff3",
"pathogenJson": "pathogen.json",
"readme": "README.md",
Expand All @@ -29,7 +34,7 @@
},
"qc": {
"frameShifts": {
"enabled": false
"enabled": true
},
"missingData": {
"enabled": false,
Expand All @@ -55,7 +60,7 @@
"windowSize": 100
},
"stopCodons": {
"enabled": false
"enabled": true
}
},
"schemaVersion": "3.0.0",
Expand Down
182 changes: 2 additions & 180 deletions nextclade/datasets/denv1/reference.fasta

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion nextclade/datasets/denv1/tree.json

Large diffs are not rendered by default.

26 changes: 13 additions & 13 deletions nextclade/datasets/denv2/genome_annotation.gff3
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
##gff-version 3
##sequence-region NC_001474.2 1 10723
NC_001474.2 feature gene 97 438 . + . codon_start=1;gene=C;gene_name=C;
NC_001474.2 feature gene 439 711 . + . codon_start=1;gene=pr;gene_name=pr;
NC_001474.2 feature gene 439 936 . + . codon_start=1;gene=M;gene_name=M;
NC_001474.2 feature gene 937 2421 . + . codon_start=1;gene=E;gene_name=E;
NC_001474.2 feature gene 2422 3477 . + . codon_start=1;gene=NS1;gene_name=NS1;
NC_001474.2 feature gene 3478 4131 . + . codon_start=1;gene=NS2A;gene_name=NS2A;
NC_001474.2 feature gene 4132 4521 . + . codon_start=1;gene=NS2B;gene_name=NS2B;
NC_001474.2 feature gene 4522 6375 . + . codon_start=1;gene=NS3;gene_name=NS3;
NC_001474.2 feature gene 6376 6756 . + . codon_start=1;gene=NS4A;gene_name=NS4A;
NC_001474.2 feature gene 6757 6825 . + . codon_start=1;gene=2K;gene_name=2K;
NC_001474.2 feature gene 6826 7569 . + . codon_start=1;gene=NS4B;gene_name=NS4B;
NC_001474.2 feature gene 7570 10269 . + . codon_start=1;gene=NS5;gene_name=NS5;
##sequence-region Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv2/genome 1 10723
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv2/genome feature gene 97 438 . + . codon_start=1;gene=C;gene_name=C;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv2/genome feature gene 439 711 . + . codon_start=1;gene=pr;gene_name=pr;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv2/genome feature gene 439 936 . + . codon_start=1;gene=M;gene_name=M;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv2/genome feature gene 937 2421 . + . codon_start=1;gene=E;gene_name=E;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv2/genome feature gene 2422 3477 . + . codon_start=1;gene=NS1;gene_name=NS1;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv2/genome feature gene 3478 4131 . + . codon_start=1;gene=NS2A;gene_name=NS2A;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv2/genome feature gene 4132 4521 . + . codon_start=1;gene=NS2B;gene_name=NS2B;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv2/genome feature gene 4522 6375 . + . codon_start=1;gene=NS3;gene_name=NS3;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv2/genome feature gene 6376 6756 . + . codon_start=1;gene=NS4A;gene_name=NS4A;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv2/genome feature gene 6757 6825 . + . codon_start=1;gene=2K;gene_name=2K;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv2/genome feature gene 6826 7569 . + . codon_start=1;gene=NS4B;gene_name=NS4B;
Reconstructed_root_sequence_of_https_nextstrain_org_dengue/denv2/genome feature gene 7570 10269 . + . codon_start=1;gene=NS5;gene_name=NS5;
9 changes: 7 additions & 2 deletions nextclade/datasets/denv2/pathogen.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
{
"alignmentParams": {
"penaltyGapOpen": 8,
"penaltyGapOpenInFrame": 12,
"penaltyGapOpenOutOfFrame": 14,
"gapAlignmentSide": "left",
"minSeedCover": 0.1,
"minLength": 1000
},
Expand All @@ -17,6 +21,7 @@
"experimental": true,
"files": {
"changelog": "CHANGELOG.md",
"examples": "sequences.fasta",
"genomeAnnotation": "genome_annotation.gff3",
"pathogenJson": "pathogen.json",
"readme": "README.md",
Expand All @@ -29,7 +34,7 @@
},
"qc": {
"frameShifts": {
"enabled": false
"enabled": true
},
"missingData": {
"enabled": false,
Expand All @@ -55,7 +60,7 @@
"windowSize": 100
},
"stopCodons": {
"enabled": false
"enabled": true
}
},
"schemaVersion": "3.0.0",
Expand Down
Loading

0 comments on commit 212f411

Please sign in to comment.