You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a sequence in a DNAbin or phyDat alignment contains a missing value in place of a nucleotide, and the alignment is processed with a distance function from phangorn (e.g., dist.ml), R crashes from a sementation fault.
On linux (Ubuntu 18.04.5) R exits with the following error:
Process was terminated
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
I understand that technically, an alignment probably shouldn't contain missing values, but the function vcfR::vcfR2DNAbin does output some NA's. Either way, these are valid values in a phyDat object and a DNAbin object, so probably phangorn's distance metrics should be able to handle them. At the very least, the behaviour is really annoying, and was pretty difficult to get to the root cause of.
Here is a MWE that is reproducable on my machine.
# Just make a sample sequence
make_seq <- function(size=size){
nucl <- c('A', 'T', 'G', 'C')
sample(nucl, replace = TRUE, size=size)
}
# Make a DNAbin object
fake_alignment <- function(size, n=10, add_na=FALSE){
l = list()
for (i in c(1:n)){
seq <- c(make_seq(size=size))
if (add_na==TRUE){ # Add an NA to the end of the alignment
seq <- c(seq, NA)
}
l[[i]] <- seq
}
m <- do.call('rbind', l)
ape::as.DNAbin(m)
}
# This works!
dna <- fake_alignment(n=2, size=1000, add_na=FALSE)
dist <- phangorn::dist.ml(dna)
# Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
dna_with_na <- fake_alignment(n=2, size=1000, add_na=TRUE)
dist <- phangorn::dist.ml(dna_with_na
The text was updated successfully, but these errors were encountered:
Hi @ejurga,
can you send me a short example with vcfR::vcfR2DNAbin code which produces this error?
What would be the proper or expected handling of NA's? Removing or coding as "?" or "-" ?
Thanks,
Klaus
Hi,
This is a bug in ape: the conversion from "character" to "DNAbin" does not consider that some values could be NA.
You can fix this bug 'by hand' since as.DNAbin.character() is not hidden. So, do the usual fix(as.DNAbin.character), then insert this line:
x[is.na(x)] <-"?"
right at the top of the body. Save and close.
Another possibility is to do this manipulation before calling as.DNAbin().
Best,
Emmanuel
When a sequence in a DNAbin or phyDat alignment contains a missing value in place of a nucleotide, and the alignment is processed with a distance function from phangorn (e.g., dist.ml), R crashes from a sementation fault.
On linux (Ubuntu 18.04.5) R exits with the following error:
I understand that technically, an alignment probably shouldn't contain missing values, but the function vcfR::vcfR2DNAbin does output some NA's. Either way, these are valid values in a phyDat object and a DNAbin object, so probably phangorn's distance metrics should be able to handle them. At the very least, the behaviour is really annoying, and was pretty difficult to get to the root cause of.
Here is a MWE that is reproducable on my machine.
The text was updated successfully, but these errors were encountered: