Skip to content

Latest commit

 

History

History
41 lines (31 loc) · 911 Bytes

README.md

File metadata and controls

41 lines (31 loc) · 911 Bytes

ATrieGC

A python/c++ module to store large amount of sequences and look at hamming distance clustering. Should be a lot faster than the naive method (measuring every hamming distances between pairs).

Installation

After cloning the git repository:

pip3 install atriegc

Usage

Working with the nucleotide alphabet

import atriegc

tr = atriegc.TrieNucl()
tr.insert("AAAATGC")
tr.insert("ATAATGC")
tr.insert("TTTTTGC")

max_hamming_distance = 1
print(tr.neighbours("AAATTGC", max_hamming_distance))
print(tr.clusters(max_hamming_distance))

Working with the amino acid alphabet

Where aminoacid are indicated with capital letters.

tr = atriegc.TrieAA()
tr.insert("CARGKYSPATFDSW")

Working with a generic alphabet

The alphabet should be passed as a string which lists all the possible characters of the alphabet

tr = atriegc.Trie("abcdef")