A python/c++ module to store large amount of sequences and look at hamming distance clustering. Should be a lot faster than the naive method (measuring every hamming distances between pairs).
After cloning the git repository:
pip3 install atriegc
import atriegc
tr = atriegc.TrieNucl()
tr.insert("AAAATGC")
tr.insert("ATAATGC")
tr.insert("TTTTTGC")
max_hamming_distance = 1
print(tr.neighbours("AAATTGC", max_hamming_distance))
print(tr.clusters(max_hamming_distance))
Where aminoacid are indicated with capital letters.
tr = atriegc.TrieAA()
tr.insert("CARGKYSPATFDSW")
The alphabet should be passed as a string which lists all the possible characters of the alphabet
tr = atriegc.Trie("abcdef")