You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am testing edlib for the purpose of calculating pairwise identites between sequences, using the BLAST/USEARCH definition of sequence identity; i.e. $\text{id} = \frac{\text{edit distance}}{\text{alignment length}}$. One difficulty I am finding is that there can be multiple alignments which have the same edit distance, but different lengths, even in the fully global case, because it may be possible to trade, e.g., 2 mismatches for 1 insertion and 1 deletion. This means that the pairwise identity is not uniquely defined. Is there any way for edlib to give me a maximally long and/or maximally short CIGAR, or even just calculate what the maximum and minimum lengths are?
If this is possible but would require some coding, I'd be willing to give it a shot if you give me some pointers.
The text was updated successfully, but these errors were encountered:
@brendanf I am not 100% sure as it has been some time since I have implemented the core part of Edlib, but off the bat I would guess that is probably not easy to do. Edlib is fast because it is optimized to use edit distance as scoring mechanism, and adding any additional criteria, like for example to try and find minimal size of cigar, will almost certainly be hard to fit into that.
One thing you can do is tell it to return multiple possible positions at which the alignment ends -> is that helpful? You can't make sure those are the min and max though if I am correct.
I can't give much better answer right now without spending significant amount of time on figure this all out. If you are intersted, there is a link to an article for Edlib that I wrote, that explains inner workings in detail, so that could shine some light on your question also.
I am testing edlib for the purpose of calculating pairwise identites between sequences, using the BLAST/USEARCH definition of sequence identity; i.e.$\text{id} = \frac{\text{edit distance}}{\text{alignment length}}$ . One difficulty I am finding is that there can be multiple alignments which have the same edit distance, but different lengths, even in the fully global case, because it may be possible to trade, e.g., 2 mismatches for 1 insertion and 1 deletion. This means that the pairwise identity is not uniquely defined. Is there any way for edlib to give me a maximally long and/or maximally short CIGAR, or even just calculate what the maximum and minimum lengths are?
If this is possible but would require some coding, I'd be willing to give it a shot if you give me some pointers.
The text was updated successfully, but these errors were encountered: