Skip to content

Scripts to empirically show how spaced seed entropy is related to their sensitivity for homology search.

Notifications You must be signed in to change notification settings

emreerhan/spaced-seeds

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spaced Seed Design

There are many heuristics for spaced seed design. However, most of these methods are not good for designing large spaced seeds (around k=60 and w=20).

My undergraduate directed studies project examined if Shannon entropy can be used as an approximation for spaced seed quality for database searching, particularily when using BioBloom tools.

Scripts

  • make_seeds.py: Randomly generates spaced seeds of a given k and w
  • markov_process_seeds.py: Generate spaced seeds of varying entropy for a given k and w
  • determine_uniqueness.py: Determines the uniqueness of the set of words produced by a tsv of spaced seeds for a given genome.
  • select_multi_spaced_seeds.py: Generates a list of multiple spaced seeds, where each set has 5 spaced seeds, designed for use in BBT.

Manuscript

https://goo.gl/Qaed8m

About

Scripts to empirically show how spaced seed entropy is related to their sensitivity for homology search.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages