You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I really love using ape and have been using lots of the functions for calculating distance between sequences; thank you for this.
I've noticed the nucleotide substitution models that include nucleotide base frequencies (namely Felsenstein-81 and Tamura-Nei 93) are giving me unexpected values for sequences with gaps when pairwise deletion is turned on.
In testing, both dist.dna(dna_sequences, model = "F81", pairwise.deletion=FALSE) and dist.dna(dna_sequences, model = "F81", pairwise.deletion=TRUE) return the same value for pairs of sequences with gaps, and I believe this is due to the base frequency tabulations are still counting bases corresponding to a base with a gap (they are not getting deleted after all.)
The text was updated successfully, but these errors were encountered:
and I believe this is due to the base frequency tabulations are still counting bases corresponding to a base with a gap (they are not getting deleted after all.)
You're correct: the base frequencies are calculated with all the data, whatever the value of pairwise.deletion. The logic behind this is that in these models base frequencies are assumed to be constant over all sequences, so it's better to use all possible observed bases to estimate them.
You can see what are the possible impacts of this by computing the distances dropping all sequences but a pair, something like this:
(This could be a bit slow if you have a very big data set.) D can be compared directly with the output of D0 <- dist.dna(dna_sequences, "F81", ......) for instance:
Hi, I really love using ape and have been using lots of the functions for calculating distance between sequences; thank you for this.
I've noticed the nucleotide substitution models that include nucleotide base frequencies (namely Felsenstein-81 and Tamura-Nei 93) are giving me unexpected values for sequences with gaps when pairwise deletion is turned on.
In testing, both
dist.dna(dna_sequences, model = "F81", pairwise.deletion=FALSE)
anddist.dna(dna_sequences, model = "F81", pairwise.deletion=TRUE)
return the same value for pairs of sequences with gaps, and I believe this is due to the base frequency tabulations are still counting bases corresponding to a base with a gap (they are not getting deleted after all.)The text was updated successfully, but these errors were encountered: