Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add and Subtract methods #62

Open
Adamtaranto opened this issue Sep 24, 2024 · 0 comments · May be fixed by #73
Open

Add and Subtract methods #62

Adamtaranto opened this issue Sep 24, 2024 · 0 comments · May be fixed by #73
Assignees
Labels
enhancement New feature or request

Comments

@Adamtaranto
Copy link
Collaborator

Adamtaranto commented Sep 24, 2024

New methods .add() and .subtract() to modify records in a KmerCountTable using a second table.

Create some new tables with different kmers and counts.

from oxli import KmerCountTable

# Create two KmerCountTable objects
kmer_table1 = KmerCountTable(ksize=3)
kmer_table2 = KmerCountTable(ksize=3)

# Count some k-mers
kmer_table1.count('AAA')
kmer_table1.count('TTT')
kmer_table1.count('AAC')

# Count for table 2
kmer_table2.count('AAA')
kmer_table2.count('AAG')

Check the table contents:

# Check the hashes for our kmers
print(f" Hash of 'AAA': {kmer_table1.hash_kmer('AAA')}") # 10679328328772601858
print(f" Hash of 'TTT': {kmer_table1.hash_kmer('TTT')}") # 10679328328772601858
print(f" Hash of 'AAC': {kmer_table1.hash_kmer('AAC')}") # 6579496673972597301
print(f" Hash of 'AAG': {kmer_table2.hash_kmer('AAG')}") # 12774992397053849803

# Check the hashes in each table
print(f"{kmer_table1.hashes}") # [10679328328772601858, 6579496673972597301]
print(f"{kmer_table2.hashes}") # [10679328328772601858, 12774992397053849803]

# List kmers and counts
list(kmer_table1) # [(10679328328772601858, 2), (6579496673972597301, 1)]
list(kmer_table2) # [(12774992397053849803, 1), (10679328328772601858, 1)]

Proposed behaviour for add():

  • Check for same k, error if diff
  • If hash is in both tables, combine values in self table
  • If hash is only in non-self table, add new key to self + update count
  • Non-self table is unchanged
# Add table 2 to table 1.
kmer_table1.add(kmer_table2)

list(kmer_table1) # [(10679328328772601858, 3), (6579496673972597301, 1), (12774992397053849803, 1)]

Proposed behaviour for subtract():

  • Check for same k, error if diff
  • If hash key is in both tables, substract non-self from self table.
  • Any negative numbers after subtract should default to zero
  • If value is zero after subtract then keep the key in self but set value to zero
  • If hash only in non self table, do nothing, do not add key to self with value zero
# fresh KmerCountTable objects
kmer_table1 = KmerCountTable(ksize=3)
kmer_table2 = KmerCountTable(ksize=3)

# Count some k-mers
kmer_table1.count('AAA')
kmer_table1.count('TTT')
kmer_table1.count('AAC')

# Count for table 2
kmer_table2.count('AAA')
kmer_table2.count('AAG')

# List kmers and counts
list(kmer_table1) # [(10679328328772601858, 2), (6579496673972597301, 1)]
list(kmer_table2) # [(12774992397053849803, 1), (10679328328772601858, 1)]

# Subtract table 2 from table 1.
kmer_table1.subtract(kmer_table2)

# Inspect changes 
list(kmer_table1) # [(10679328328772601858, 1), (6579496673972597301, 1)] # AAA/TTT count reduced by 1

@ctb thoughts on subtract behaviour?

Could also add __add__ and __sub__ dunder methods to support things like kmer_table1 + kmer_table2. Though that might be confusing re which table is being updated, so maybe not.

Would we ever want to apply a constant to all values in a table? i.e. table + 1, table * 10, table - 1. If so that is probably a better use for the dunders.

@Adamtaranto Adamtaranto added the enhancement New feature or request label Sep 24, 2024
@Adamtaranto Adamtaranto self-assigned this Sep 24, 2024
@Adamtaranto Adamtaranto linked a pull request Oct 5, 2024 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant