-
Notifications
You must be signed in to change notification settings - Fork 15
Improvements TODOs
leoisl edited this page Jul 9, 2019
·
10 revisions
- Scheduling
- Mutex/locks per site in grouped_allele_counts; or map-reduce (each thread gets its own coverage structure, and reduce later)
- Bloom filter on kmers in (variant sites of) prg; then reads must have all constituent kmers in filter, to get mapped
- Rework on indexing only kmers overlapping variant sites (with max read size context)
- [Infer kmer-size for the index] Analyse the PRG string and find the most dense region and infer the kmer size with a number of paths that it is possible to enumerate
- Singleton intervals: directly get the bwt character rather than double rank query it
- SearchState: Maybe, don't record allele/site combinations traversed, and do a round of forward mapping for successfully mapped reads, afterwards.
- concurrent querying of alternate alleles in the same site; they are postfixed with the same even site marker. Later (when end read or encounter site entry boundary), we can record site/allele combo ids.
- Assign each (seen) Marker/Allele ID combination a unique ID, and store that in each
SearchState
- Use
std::vector
notstd::list
for variant site paths, to avoid 64-bit pointer overhead. Maybe convert from list to vector after kmer is indexed; and convert from vector to list for quasimapping. Further, the vector can be stored compressed and onlySearchState
s relevant for a given read be decompressed at quasimap. -
SearchState::variant_site_path
s can all be store in a vector ofVariantSitePath
, and inSearchState
we would store only auint32_t
to the id of its respectivevariant_site_path
. This is useful only if manySearchState
s have exactly the samevariant_site_path
. - Store only one
uint
for singleton intervals! - Represent
SearchState::variant_site_state
andSearchState::invalid
as 1 byte and with masks. Add a getter method to the enum to make it transparent.
1. Represent as a set-trie (see https://hal.inria.fr/hal-01506780/document)
Implementations are already available, but not in C++:
Java: https://github.com/SmartDataAnalytics/TagMap
Rust: https://github.com/makoConstruct/set_trie
C++: us?