Skip to content

Basic SetOps

Adam Taranto edited this page Sep 23, 2024 · 3 revisions

This section introduces basic set operations with KmerCountTable objects.

oxli supports basic set operations between tables - union: Combined set of hashes from both tables - intersection: Shared hashes - difference: Hashes unique to table one - symmetric_difference: Hashes unique to either table (not shared)

# Create two KmerCountTable objects
kmer_table1 = oxli.KmerCountTable(ksize=3)
kmer_table2 = oxli.KmerCountTable(ksize=3)

# Count some k-mers
kmer_table1.count('AAA')
kmer_table1.count('TTT')
kmer_table1.count('AAC')

kmer_table2.count('AAA')
kmer_table2.count('AAG')

# Check the hashes for our kmers
print(f" Hash of 'AAA': {kmer_table1.hash_kmer('AAA')}") # 10679328328772601858
print(f" Hash of 'TTT': {kmer_table1.hash_kmer('TTT')}") # 10679328328772601858
print(f" Hash of 'AAC': {kmer_table1.hash_kmer('AAC')}") # 6579496673972597301
print(f" Hash of 'AAG': {kmer_table2.hash_kmer('AAG')}") # 12774992397053849803

# Check the hashes in each table
print(f"{kmer_table1.hashes}") # [10679328328772601858, 6579496673972597301]
print(f"{kmer_table2.hashes}") # [10679328328772601858, 12774992397053849803]
# Use Python set operation dunder methods
union_hashes = kmer_table1 | kmer_table2          # Union
intersection_hashes = kmer_table1 & kmer_table2   # Intersection
difference_hashes = kmer_table1 - kmer_table2     # Difference
symmetric_difference_hashes = kmer_table1 ^ kmer_table2 # Symmetric Difference

# Print the results
print("Union:", union_hashes)
print("Intersection:", intersection_hashes)
print("Difference:", difference_hashes)
print("Symmetric Difference:", symmetric_difference_hashes)

# Union: {12774992397053849803, 10679328328772601858, 6579496673972597301}
# Intersection: {10679328328772601858}
# Difference: {6579496673972597301}
# Symmetric Difference: {12774992397053849803, 6579496673972597301}

Retrieve the counts for all hashes in the union set from kmer_table_1

# Use get_hash_array to fetch counts for a list of hashes
union_counts = kmer_table1.get_hash_array(list(union_hashes))

print(union_counts)
# [0, 2, 1]