You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Haven't figured out an approach for khmer yet, but:
what about,
for each read,
check to see if first 32-mer has been seen before
if it has, discard read
otherwise, store first 32-mer, keep read
I can think of a modification to enable this for transcriptomes/high
coverage metagenomes, too.
It only works for exact matches in the first 32 bases, so some tuning (20?
32? 16?) might be useful. We could also use some fraction of first 3
k-mers, etc.
Michael:
That's how I would do it. Doesn't need to be perfect.
The text was updated successfully, but these errors were encountered:
This is a bad idea IMO. We already have known problems with removing branches from the de Bruijn graph with diginorm; in the event where we have 2 branches, say each with coverage ~5, the chances of throwing away an entire branch with this approach are very high.
ctb
changed the title
Implement a technique to remote artificially duplicated reads from within khmer
Implement a technique to remove artificially duplicated reads from within khmer
Jun 12, 2015
On Dec 17, 2013 2:06 PM, "C. Titus Brown" [email protected] wrote:
Michael:
That's how I would do it. Doesn't need to be perfect.
The text was updated successfully, but these errors were encountered: