Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing off-target sequences without ODSeq #68

Open
cmorganl opened this issue Jan 15, 2021 · 0 comments
Open

Removing off-target sequences without ODSeq #68

cmorganl opened this issue Jan 15, 2021 · 0 comments

Comments

@cmorganl
Copy link
Collaborator

Currently, treesapp create relies on the package ODSeq for finding outliers in a multiple sequence alignment. This is optionally performed in treesapp create by invoking the --outdet_align flag.

The only issue with ODSeq is that is no longer supported and it's uncertain how long it will continue to be hosted at its current location (http://www.bioinf.ucd.ie/download/od-seq.tar.gz). A bioconductor package exists for it on conda but I'd rather not use it, or create a conda recipe for the binary, so it is currently being downloaded and compiled by users who want to use it (probably not many).

An added benfit of ditching ODSeq is it would be one less dependency.

An alternative method should be implemented to replace this. I propose two options:

  1. Provide a 'subtraction' fasta file to treesapp create and it will cluster the regular input sequences with the 'subtraction' set. Any sequences that are clustered with those from 'subtraction' will be removed.
  2. treesapp create will build a profile HMM from the input sequences (probably cluster them first). The input sequences will then be aligned to the HMM and those that align poorly will be removed. Something like this was used by GraftM.

These should be complementary, not redundant, to the other method for filtering off-target reference sequences - with a provided profile HMM using the treesapp create argument '--profile'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant