-
Notifications
You must be signed in to change notification settings - Fork 0
Dump Hash Count Pairs
You can use the .dump()
method to write hash:count pairs from a KmerCountTable
to a tab-delimited output file.
Example data:
import oxli
# Demo table
kct = oxli.KmerCountTable(ksize=4)
kct.count("AAAA") # Count 'AAAA'
kct.count("TTTT") # Count revcomp of 'AAAA'
kct.count("AATT") # Count 'AATT'
kct.count("GGGG") # Count 'GGGG'
kct.count("GGGG") # Count again.
# Hashes
# 17832910516274425539 = AAAA/TTTT
# 382727017318141683 = AATT
# 73459868045630124 = GGGG
By default dump()
will return unsorted records. Order will vary between runs.
kct.dump()
>>> [(17832910516274425539, 2), (382727017318141683, 1), (73459868045630124, 2)]
Use the sortcounts
option to sort records on counts then on keys:
kct.dump(sortcounts=True)
>>> [(382727017318141683, 1), (73459868045630124, 2), (17832910516274425539, 2)]
Use the sortkeys
option to sort records on hash keys:
kct.dump(sortkeys=True)
>>> [(73459868045630124, 2), (382727017318141683, 1), (17832910516274425539, 2)]
Sorted hash:count pairs can be written to a tab-delimited text file by specifying an output target:
# Write tab-delimited records to kct.dump
kct.dump(sortcounts=True, file="kct.dump")
If no output file is specified, records are returned as list of (hash,count) tuples (as above).
This list can be converted to a pandas dataframe:
import pandas as pd
table_dump = kct.dump(sortcounts=True)
df = pd.DataFrame(table_dump, columns=['Hash', 'Count'])
print(df)
>>>
'''
Hash Count
0 382727017318141683 1
1 73459868045630124 2
2 17832910516274425539 2
'''
If table is empty, returns empty list:
empty_kct = oxli.KmerCountTable(ksize=4)
empty_kct.dump()
>>> []
Installing Oxli
Basic Setup
For Developers
Getting Started
Getting Started
Counting Kmers
Basic Counting
Extracting from Files
Handling Bad Kmers
Looking up Counts
Single Kmer Lookup
Multiple Kmer Lookup
Removing Records Remove Kmers Abundance Filtering
Exploring Count Tables
Iterating Records
Attributes
Set Operations
Basic SetOps
Exporting Data
Histo: Export Frequency Counts
Dump: Write Hash:Count Pairs
Save and Load KmerCountTables