Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is.ultrametric unexpected behavior #122

Open
pedrosenna opened this issue Jul 12, 2024 · 4 comments
Open

is.ultrametric unexpected behavior #122

pedrosenna opened this issue Jul 12, 2024 · 4 comments

Comments

@pedrosenna
Copy link

pedrosenna commented Jul 12, 2024

Dear Emmanuel,

first of all, thanks for this amazing package.

I just found an unexpected behavior when i was trying to run splits::gmyc() over a set of post-burnin files from BEAST2 using purrr::map(), which is the tidyverse's lapply() function.

The problem seems to be in the is.ultrametric() evaluation made prior to the analysis. For example, running over a set of 1000 ultrametric trees generated using BEAST2:

Evaluating a multiPhylo object using is.ultrametric():

all(is.ultrametric(trees))

[1] TRUE

Evaluating a multiPhylo object using is.ultrametric() inside lapply():

suppressWarnings(all(lapply(trees, is.ultrametric)))

[1] TRUE

Evaluating a multiPhylo object using is.ultrametric() inside purrr::map():

suppressWarnings(all(purrr::map(trees, is.ultrametric)))

[1] FALSE

I really don't understand what's going on here. Any ideas?

@emmanuelparadis
Copy link
Owner

Dear Pedro,

Thanks for your appreciation (always appreciated!)

I run the following code (using sapply and unlist to avoid warnings) a couple of minutes and it run fine even with 100 tips (though it's a bit slow...):

i <- 0
repeat {
    i <- i + 1
    trees <- replicate(1000, rcoal(10), FALSE)
    class(trees) <- "multiPhylo"
    stopifnot(all(is.ultrametric(trees)))
    stopifnot(all(sapply(trees, is.ultrametric)))
    stopifnot(all(unlist(purrr::map(trees, is.ultrametric))))
}

Just in case: you can change the argument option of is.ultrametric(). From ?is.ultrametric:

The test is based on the distances from each tip to the root and a
criterion: if ‘option = 1’, the criterion is the scaled range
((max - min/max)), if ‘option = 2’, the variance is used (this was
the method used until ape 3.5). The default criterion is invariant
to linear changes of the branch lengths.

Best,

Emmanuel

@pedrosenna
Copy link
Author

pedrosenna commented Jul 12, 2024

Dear Emmanuel,

Thanks for the quick response! After running the same code you provided on my notebook, i wondered if that could be somehow related to how i am importing the tree files into R. I am importing BEAST2 trees using ape::read.nexus(). I did a test converting the BEAST2 files into newick format using FigTree and purrr:map() worked as intended.

Here's a reproducible example using your code.

trees <- replicate(1000, rcoal(10), FALSE)

class(trees) <- "multiPhylo"

write.nexus(trees, file = "data/test/random_trees.nex")

trees2 <- read.nexus(file= "data/test/random_trees.nex")

identical(trees, trees2) # FALSE

all(is.ultrametric(trees2)) # TRUE

all(sapply(trees2, is.ultrametric)) # TRUE

all(unlist(purrr::map(trees2, is.ultrametric))) # FALSE

I wonder why trees and trees2 fails to pass through identical() testing. File sizes also differ a lot (2.3 MB vs 1.5 MB).

@pedrosenna
Copy link
Author

pedrosenna commented Jul 13, 2024

Dear Emmanuel,

here's another example of the same behavior after saving trees in nexus format and then reading back to R.

library(ape)
library(tidyverse)

set.seed(123)
trees <- replicate(1000, rcoal(10), FALSE)

write.nexus(trees, file = "random_trees.nex")
write.tree(trees, file = "random_trees.nwk")

trees_nex <- read.nexus("random_trees.nex")
trees_nwk <- read.tree("random_trees.nwk")

all(unlist(map(trees_nex, is.ultrametric))) # FALSE
all(unlist(map(trees_nwk, is.ultrametric))) # TRUE

Also, when comparing different str() levels between trees, i found this:

identical(trees_nex, trees_nwk)

[1] FALSE

identical(trees_nex[1], trees_nwk[1])

[1] FALSE

identical(trees_nex[[1]], trees_nwk[[1]])

[1] TRUE

str(trees_nex[1])

Class "multiPhylo"
List of 1
$ :List of 3
..$ edge : int [1:18, 1:2] 11 12 12 11 13 13 14 15 15 16 ...
..$ edge.length: num [1:18] 2.827 0.119 0.119 2.726 0.22 ...
..$ Nnode : int 9
..- attr(, "class")= chr "phylo"
..- attr(
, "order")= chr "cladewise"

  • attr(*, "TipLabel")= chr [1:10] "t10" "t7" "t1" "t8" ...
str(trees_nwk[1])

Class "multiPhylo"
List of 1
$ :List of 4
..$ edge : int [1:18, 1:2] 11 12 12 11 13 13 14 15 15 16 ...
..$ edge.length: num [1:18] 2.827 0.119 0.119 2.726 0.22 ...
..$ Nnode : int 9
..$ tip.label : chr [1:10] "t10" "t7" "t1" "t8" ...
..- attr(, "class")= chr "phylo"
..- attr(
, "order")= chr "cladewise"

Edit: Just found a workaround checking tracerer::parse_beast_trees.R here

# Use c to strip the state names and convert it to a pure multiPhylo
  trees_nex <- c(trees_nex)

all(unlist(map(trees_nex, is.ultrametric)))

[1] TRUE

@emmanuelparadis
Copy link
Owner

Dear Pedro,

I think there are several issues behind these results.

  1. The order of the elements in "phylo" objects is not mandatory, so you cannot always rely on identical() for comparing trees: use all.equal() instead (see ?all.equal.phylo for details). Of course, if identical() returns TRUE, so does all.equal().

  2. When printing numerical (ie, real, not integers) values in a file, there is (often) the issue of how many digits to print which may "distort" the values, eg, printing 16 digits is OK but not 10 digits:

R> as.numeric(sprintf("%.16f", pi)) == pi
[1] TRUE
R> as.numeric(sprintf("%.10f", pi)) == pi
[1] FALSE

write.tree() has the option digits = 10 (for the branch lengths) which can be modified. However, write.nexus() has no such option and 10 digits are printed (unless you modify the code yourself).

If you just need to save trees to use them later in R/ape, my recommendation is to use saveRDS(): this is faster, the file is compressed, and there is no loss in numerical precision for the branch lengths.

  1. Because assessing ultrametricity is a statistical question, you may have encountered something like the followings:
R> tr <- rcoal(10)
R> is.ultrametric(tr, options = 1)
[1] TRUE
R> is.ultrametric(tr, options = 2)
[1] TRUE
R> ts <- read.tree(text = write.tree(tr, digits = 6))
R> is.ultrametric(ts, option = 1)
[1] FALSE
R> is.ultrametric(ts, option = 2)
[1] TRUE

Best,

Emmanuel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants