Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TreeSAPP classify bug for TaxIDs that have been merged #99

Open
janstett opened this issue Oct 1, 2024 · 0 comments
Open

TreeSAPP classify bug for TaxIDs that have been merged #99

janstett opened this issue Oct 1, 2024 · 0 comments
Assignees
Labels
bug Unexpected error raised? Weird results? Use this label.

Comments

@janstett
Copy link

janstett commented Oct 1, 2024

The NCBI sometimes merges old TaxIDs into new ones. However, depending on how the header name is structured, treesapp can't match the taxIDs to a lineage and will erroneously assign a sequence to root:

For Example, in the SoxZ package:

1525715.IX54_08960 get read as having a taxID as 1525715,
however, 1525715 has been merged into 1545044.
TreeSAPP will classify this sequence as Root.

However, the taxonomy should be:
Bacteria; Pseudomonadota; Alphaproteobacteria; Rhodobacterales; Paracoccaceae; Paracoccus; Paracoccus sanguinis
https://www.ncbi.nlm.nih.gov/protein/694216822

For cases where the protein accession is listed without a taxID prefix, this issue is avoided. It seems that this is more of an issue for sequences that originate from EggNog.

  • TreeSAPP Version [e.g. 0.11.4]

Additional context
Add any other context about the problem here.

@janstett janstett added the bug Unexpected error raised? Weird results? Use this label. label Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unexpected error raised? Weird results? Use this label.
Projects
None yet
Development

No branches or pull requests

2 participants