diff --git a/README.md b/README.md index 5939727..d6aae67 100644 --- a/README.md +++ b/README.md @@ -1,19 +1,27 @@ # lshmm -This is a Python library for prototyping and testing implementations of algorithms using the Li & Stephens (2003) HMM. +**lshmm** is a Python library for prototyping, experimenting, and testing implementations of algorithms using the Li & Stephens (2003) Hidden Markov Model. ## Usage ### Inputs -Reference panel contains sample and/or (partial) ancestral haplotypes. +#### Data +* Sample and/or ancestral haplotypes comprising a reference panel. +* Query haplotypes. -### Demo -Forwards algorithm -Backwards algorithm +In the haploid mode, the alleles in haplotypes can be represented by any integer value (besides `-1` and `-2`, which are special values). In the diploid mode, the genotypes (encoded as allele dosages) can be `0` (homozygous for the reference allele), `1` (heterozygous for the alternative allele), or `2` (homozygous for the alternative allele). Currently, multiallelic sites are supported only in the haploid mode, but not the diploid mode. + +Note that there are two special values `NONCOPY` and `MISSING`. `NONCOPY` (or `-2`) represent non-copiable states, and can only be found in partial ancestral haplotypes in the reference panel. `MISSING` (or `-1`) representing missing data, and can be found only in query haplotypes. + +#### Parameters +* Per-site recombination probabilities. +* Per-site mutation probabilities. + +### Algorithms Viterbi algorithm Log-likelihood evaluation of a copying path ### Features * Scaling of mutation rate by the number of distinct alleles per site. -* Non-copiable allelic state in the reference panel (`NONCOPY`). -* Missing allelic state in the query (`MISSING`). +* Non-copiable state in the reference panel (`NONCOPY`). +* Missing state in the query (`MISSING`). * Multiallelic sites.