Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct geo_loc_name for PP_00000UB.2 (Ebolavirus Sudan) #2

Open
pvanheus opened this issue Oct 28, 2024 · 2 comments
Open

Correct geo_loc_name for PP_00000UB.2 (Ebolavirus Sudan) #2

pvanheus opened this issue Oct 28, 2024 · 2 comments

Comments

@pvanheus
Copy link

Describe the possible issue

This is a sequence ingested from NCBI. In NCBI it has accession KU182912.1 and isolate name "Sudan virus/H. sapiens-tc/SDN/2000/Gulu-200011676". In this build of Ebolavirus Sudan sequences it clearly clusters with other sequences from the Gulu, Uganda outbreak of 2000.

The NCBI record, however, states geo_loc_name="Sudan". This is certainly incorrect. The sequence was deposited in 2015, many years after the outbreak, and the authors likely made a mistake with the metadata. All attempts to contact the original sequence authors have, thus far, failed.

Evidence of the problem

The below phylogeny shows that Ebolavirus Sudan has two clades, each of which is restricted to a single country (Uganda and South Sudan).

image

The sequence in question is labeled Gulu-200011676 in the phylogeny.

There was no Ebolavirus Sudan outbreak in (South) Sudan in 2000, the year listed as collection date for this sequence in Genbank. See list of Ebola outbreaks from US CDC.

Suggested change

The geo_loc_country should be changed to Uganda.

Full list of affected sequences

PP_00000UB.2

@emily-smith1
Copy link
Collaborator

I agree with Peter's suggested change, based on tree topology and the US CDC not listing any Ebola outbreaks outside of Uganda in 2000. Note that this sequence now has accession PP_00000UB.3 listed in Pathoplexus.

@m-a-martin
Copy link
Collaborator

I agree with Emily and Peter that this seems to be a mis-labelled country of origin based on the sequence name, US CDC reported ebola outbreaks, and phylogenetic clustering with other Ugandan sequences. The collection country should be changed to Uganda.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants