Implement a technique to remove artificially duplicated reads from within khmer #259

ctb · 2014-01-19T03:13:15Z

On Dec 17, 2013 2:06 PM, "C. Titus Brown" [email protected] wrote:

Haven't figured out an approach for khmer yet, but:

what about,

for each read,

check to see if first 32-mer has been seen before
if it has, discard read
otherwise, store first 32-mer, keep read

I can think of a modification to enable this for transcriptomes/high
coverage metagenomes, too.

It only works for exact matches in the first 32 bases, so some tuning (20?
32? 16?) might be useful. We could also use some fraction of first 3
k-mers, etc.

Michael:

That's how I would do it. Doesn't need to be perfect.

camillescott · 2014-01-31T03:22:27Z

This is a bad idea IMO. We already have known problems with removing branches from the de Bruijn graph with diginorm; in the event where we have 2 branches, say each with coverage ~5, the chances of throwing away an entire branch with this approach are very high.

ctb changed the title ~~Implement a technique to remote artificially duplicated reads from within khmer~~ Implement a technique to remove artificially duplicated reads from within khmer Jun 12, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a technique to remove artificially duplicated reads from within khmer #259

Implement a technique to remove artificially duplicated reads from within khmer #259

ctb commented Jan 19, 2014

camillescott commented Jan 31, 2014

Implement a technique to remove artificially duplicated reads from within khmer #259

Implement a technique to remove artificially duplicated reads from within khmer #259

Comments

ctb commented Jan 19, 2014

camillescott commented Jan 31, 2014