From b4ca65c5403c88a1b67985a5cfc577bf78a16af4 Mon Sep 17 00:00:00 2001 From: Marcela Torres <83386529+DOH-LMT2303@users.noreply.github.com> Date: Wed, 8 Jan 2025 14:04:12 -0800 Subject: [PATCH] Update README.md with determination of min genome length threshold --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index 2284581..4285fd2 100644 --- a/README.md +++ b/README.md @@ -74,4 +74,8 @@ For global lineage designations, we query [pathoplexus](https://pathoplexus.org/ ### Host mapping to Host Genus and Host Type We further refined the information in the NCBI Host column by categorizing it into **Host_Genus** and **Host_Type**, creating broader groupings for more effective data analysis. For example, the **Host** _Homo sapiens_ is classified under **Host_Genus** as _Homo_ and **Host_Type** as Human. This broader categorization is particularly useful for visualizing the phylogenetic tree. Instead of distinguishing between individual mosquito species, you can use the broader categories like **Host_Genus** _Culex_ or the higher-level category **Host_Type** Mosquito to color the tips of the tree. +### Determination of Mininum Genome Length +Nextstrain's phylogenetic workflow defaults to excluding sequences with less than 90% genome coverage, as the alignment of short sequences can be unreliable. However, due to the limited number of WNV sequences available in NCBI, we evaluated minimum genome length thresholds of 90%, 80%, 75%, and 70%. For each threshold, we ran the Washington-focused build and compared: (1) the number of sequences included, (2) data gap locations in the alignment files using an alignment viewer, and (3) the topology and lineage assignments from the phylogenetic tree outputs to determine the optimal threshold. We concluded that a minimum genome length of 75% included a higher number of sequences while balancing alignment quality. Lastly, we validated this threshold using the global build. +* To modify the minimum length of nucleotide sequence in the WNV global build enter the desired threshold in the --min-length paremeter that is listed in the [defaults/config.yaml](https://github.com/nextstrain/WNV/blob/main/phylogenetic/defaults/config.yaml) file +* To modify the minimum length of nucleotide sequence in the WNV Washington focused build enter the desired threshold in the --min-length paremeter that is listed in the [washington-state/config.yaml](https://github.com/nextstrain/WNV/blob/main/phylogenetic/build-configs/washington-state/config.yaml) file