Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Migrating from Khmer #88

Open
dcdanko opened this issue Nov 1, 2024 · 4 comments
Open

Question: Migrating from Khmer #88

dcdanko opened this issue Nov 1, 2024 · 4 comments

Comments

@dcdanko
Copy link

dcdanko commented Nov 1, 2024

Hi,

My team is currently using Khmer and we want to upgrade to Oxli. We currently use khmer's Nodetable to store binary present/not-present kmer sets. We have a few large precomputed nodetables.

My questions:

  • Do we need to recreate the large files or can we use them as is?
  • What is the corresponding class in Oxli to khmer's nodetable?

Thank you for your help!

@Adamtaranto
Copy link
Collaborator

Hi David,

Atm there are three ways to manually set the counts in a KmerCountTable object.

  1. Populate KmerCountTable from json file.

Oxli supports serialisation of KmerCountTable objects into json format, you can modify this file and load it back into a new object. See wiki description.

  1. Set individual kmer values using dictionary syntax.
from oxli import KmerCountTable

# Create new count table
kct = KmerCountTable(ksize=4) # Note: Use "store_kmers=True" only if you need to retrieve a list of all kmers in table. This option slows counting.

# Manually add new kmer and set count
kct['GGGG'] = 1000

# Only canonical kmer is stored
kct.get('GGGG')
>>> 1000

kct.get('CCCC')
>> 1000
  1. Add counts with user specified hash.

This might be useful if you only have hashes for canonical kmers stored.

# Add and increment count for hash
kct.count_hash(6779379503393060785)

kct.get("AACC")
>>> 1

kct.get("GGTT")
>>> 1

# Increment count
kct.count_hash(6779379503393060785)

kct.get("AACC")
>>> 2

I could add support for bulk kmer + count upload from a tab delimited file if that would be useful. See #77.

Do you need to store kmers and their reverse complement separately? Oxli currently stores counts under the canonical kmer.

@Adamtaranto
Copy link
Collaborator

@ctb might be worth adding a khmer migration tutorial to the wiki.

lmk what you think is the most efficient way to do this.

@dcdanko
Copy link
Author

dcdanko commented Nov 4, 2024

@Adamtaranto thank you for the info- I believe the count_hash method will be sufficient for us since I think that's what is stored in the khmer nodetables. We don't really need the bulk add in this case.

Khmer uses an old version of oxli right? Will the hash function be the same?

@Adamtaranto
Copy link
Collaborator

I believe the hashing should be the same, mumurhash64 in both cases. @ctb?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants