-
Notifications
You must be signed in to change notification settings - Fork 0
Home
PyGas is a small module intended to provide basic sequence alignment support specifically aimed at crispr guides.
- Query: Sequenced reads (experimental) e.g. fastq data
- Target: Target sequences (designed) e.g. crispr guide library
There are features to enable both fast-rough alignments and slower-robust analysis.
- Fast
- Exact or substring matching
- Slower
- Attempt fast
- Smith-waterman (with low score escapes)
Each query is compared against all targets as an atomic step.
You can control this via the Aligner.match_type
variable:
- Exact only
- Query and target are of same length
- Query in Target
- Query sequence must be within target (or exact)
- Target in Query
- Target sequence must be within query (or exact)
- Any (1+2)
- Allow any match scenario
All the above are scored based on the total number of bases matched (score equals shortest sequence length).
Multiple alignments can be returned, all will have the same score.
This is controlled by the Aligner.exact_match
variable. Setting to false
will result in any reads not mapped via fast-matching into the Smith-Waterman alignment function.
This performs a basic local alignment, however, it tracks the highest score obtained within the scan of the guide list and aborts alignment early if a subsequent alignment can not improve on this (based on remaining sequence to populate in the matrix).
The "fuzziness" of matching can be controlled by a rule based system.
Rules have a direct impact on run time as they increase the time taken to abort an alignment, individual costs are as follows:
-
M
= 1 -
I
= 2 (single b.p.) -
D
= 2 (single b.p.)
Performance is only impacted by the maximum penalty you allow, for example:
-
MI
has the same penalty (3) asMMM
. Mappings with 3 mismatches will be allowed while performing alignments but then discarded.
Be aware if you wish to allow up to 2 mismatch
or 1 mismatch + 1 b.p. insert
you must specify:
pycroquet ... --rules MM --rules MI
Multiple alignments can be returned, all will have the same score.