Skip to content

Commit

Permalink
Merge pull request #292 from nextstrain/clade-i-ref
Browse files Browse the repository at this point in the history
Use clade I reference KJ642613 for clade I build, mask correctly
  • Loading branch information
jameshadfield authored Nov 20, 2024
2 parents 36b3e82 + e629de0 commit 12996b6
Show file tree
Hide file tree
Showing 7 changed files with 9,270 additions and 6 deletions.
40 changes: 40 additions & 0 deletions phylogenetic/defaults/clade-i/auspice_config.json
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,46 @@
"title": "Release Date",
"type": "categorical"
},
{
"key": "sra_accession",
"title": "SRA Accession",
"type": "categorical"
},
{
"key": "coverage",
"title": "Coverage",
"type": "continuous"
},
{
"key": "missing_data",
"title": "Missing Data",
"type": "continuous"
},
{
"key": "nonACGTN",
"title": "Non-ACGTN",
"type": "continuous"
},
{
"key": "institution",
"title": "Institution",
"type": "categorical"
},
{
"key": "division",
"title": "Division",
"type": "categorical"
},
{
"key": "location",
"title": "Location",
"type": "categorical"
},
{
"key": "abbr_authors",
"title": "Abbreviated Authors",
"type": "categorical"
},
{
"key": "date",
"title": "Collection date",
Expand Down
12 changes: 6 additions & 6 deletions phylogenetic/defaults/clade-i/config.yaml
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
reference: "defaults/reference.fasta"
genome_annotation: "defaults/genome_annotation.gff3"
genbank_reference: "defaults/reference.gb"
reference: "defaults/clade-i/reference.fasta"
genome_annotation: "defaults/clade-i/genome_annotation.gff3"
genbank_reference: "defaults/clade-i/reference.gb"
include: "defaults/clade-i/include.txt"
clades: "defaults/clades.tsv"
lat_longs: "defaults/lat_longs.tsv"
auspice_config: "defaults/clade-i/auspice_config.json"
description: "defaults/description.md"
tree_mask: "defaults/tree_mask.tsv"
tree_mask: "defaults/clade-i/tree_mask.tsv"

# Use `accession` as the ID column since `strain` currently contains duplicates¹.
# ¹ https://github.com/nextstrain/mpox/issues/33
Expand All @@ -18,7 +18,7 @@ auspice_name: "mpox_clade-I"

filter:
min_date: 1900
min_length: 100000
min_length: 170000


### Filter to only Clade I sequences
Expand Down Expand Up @@ -56,7 +56,7 @@ recency: true
mask:
from_beginning: 800
from_end: 6422
maskfile: "defaults/mask.bed"
maskfile: "defaults/clade-i/mask.bed"

colors:
ignore_categories: "division location"
362 changes: 362 additions & 0 deletions phylogenetic/defaults/clade-i/genome_annotation.gff3

Large diffs are not rendered by default.

25 changes: 25 additions & 0 deletions phylogenetic/defaults/clade-i/mask.bed
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
Chrom ChromStart ChromEnd locus tag Comment
chr 8340 8480 indel variation and long repetitive elements
chr 17960 17980 Next to stretch of Ns, suspicious
chr 18930 19050 termini of clade Ib deletion are sometimes incorrectly called
chr 19900 20100 termini of clade Ib deletion are sometimes incorrectly called
chr 20250 20280 triple mutation in Ib is inconsistently called
chr 22890 22920 homopolymer stretch
chr 31570 31610 Indel often incorrectly called
chr 77870 77880 Likely reversion to reference in some INRB sequences only
chr 109460 109470 Lots of ambiguous right next to start of Ns
chr 109730 109750 Indel often incorrectly called
chr 123370 123400 Right next to stretch of Ns in INRB sequences only
chr 138000 138300 indel variation and long repetitive elements
chr 141700 141800 indel variation and long repetitive elements
chr 144750 144830 indel variation and long repetitive elements
chr 148440 148660 indel variation and long repetitive elements
chr 149970 150020 indel variation and long repetitive elements
chr 152170 152300 indel variation and long repetitive elements
chr 157520 157570 homopolymer/tandem repeats
chr 158790 158800 Mutation right next to stretch of Ns, INRB sequences only
chr 162580 162610 indel variation and long repetitive elements
chr 169000 169350 indel variation and long repetitive elements
chr 177250 177350 indel variation and long repetitive elements
chr 178500 178900 indel variation and long repetitive elements
chr 180650 180710 Indel sometimes called messily
Loading

0 comments on commit 12996b6

Please sign in to comment.