Skip to content

Extracting from files

Adam Taranto edited this page Sep 23, 2024 · 3 revisions

This section describes how to extract k-mers from files.

Import necessary modules:

import screed # FASTA/FASTQ parsing
import oxli   # This package!

Create a KmerCountTable with a k-mer size of 31:

# New KmerCountTable object that will count 31-mers
kct = oxli.KmerCountTable(ksize=31)

Open a FASTA file and consume k-mers from all the sequences within.

consume will report the total number of k-mers consumed.

for record in screed.open('example.fa'):
    kct.consume(record.sequence)
# 349900 # Report total k-mers consumed

Use .get() to look up the count of CGGAGGAAGCAAGAACAAAATATTTTTTCAT in the count table:

kct.get('CGGAGGAAGCAAGAACAAAATATTTTTTCAT')
#1 # get() returns k-mer count

Kmers and their reverse complement sequences are counted as one and will always have the save value.

kct.get('ATGAAAAAATATTTTGTTCTTGCTTCCTCCG') #revcomp of 'CGGAGGAAGCAAGAACAAAATATTTTTTCAT'
#1