Skip to content

Handling Bad Kmers

Adam Taranto edited this page Sep 24, 2024 · 5 revisions

This section explains how to handle bad k-mers during counting.

Handling bad k-mers in DNA sequence

Create empty table:

# New KmerCountTable object that will count 31-mers
kct = oxli.KmerCountTable(ksize=31)

You can fail on bad k-mers:

kct.consume('XXXCGGAGGAAGCAAGAACAAAATATTTTTTCATGGG', allow_bad_kmers=False)

>>>
Traceback (most recent call last):
...
ValueError: bad k-mer encountered at position 0

or skip them without raising an error (which is the default):

kct.consume('XXXCGGAGGAAGCAAGAACAAAATATTTTTTCATGGG', allow_bad_kmers=True)
#4

If you tolerate non-DNA characters with allow_bad_kmers=True, then all of the valid k-mers will be counted, and all of the bad k-mers will be skipped:

kct.get("CGGAGGAAGCAAGAACAAAATATTTTTTCAT")
#2

kct.get("AGGAAGCAAGAACAAAATATTTTTTCATGGG")
#1