Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in if (pp < 0.025) { : missing value where TRUE/FALSE needed #24

Open
avish96 opened this issue Sep 9, 2024 · 2 comments
Open
Labels

Comments

@avish96
Copy link

avish96 commented Sep 9, 2024

The following code results in an error that I am unable to pinpoint the cause of. Any help would be appreciated!

## Loading libraries
library(ape)
library(treedater)

I first load my sample simulated time tree and simulate a rate tree from it, with one branch having a 10x higher rate than the rest.

## Read a sample time tree 
nwk_text <- '((((b_0_88:0.765387,b_0_140:3.12651)g_0_64:1.11098,(b_0_145:2.09135,b_0_189:3.73543)g_0_98:2.20924)g_0_45:2.25562,b_0_167:7.52794)g_0_12:0.442501,((b_0_42:0.493604,b_0_91:2.75931)g_0_30:0.405358,(b_0_44:0.530121,(((o_0_161:0.811279,o_0_306:0.811279)g_0_306:5.63543,b_0_249:3.0243)g_0_161:2.11281,(b_0_260:1.56686,(o_0_248:3.45721,o_0_216:3.45721)g_0_248:1.15911)g_0_216:3.9432)g_0_113:3.56052)g_0_37:0.54448)g_0_22:1.50395)g_0_9;'
v_t <- read.tree(text=nwk_text)  
v_t <- ladderize(v_t,right=FALSE)

## Number of edges
n_edge <- length(v_t$edge.length)

## Number of sites in sequence
nsites <- 1000

## Simulate rate tree, with a single branch rate different from others
v_r <- v_t
v_r$edge.length[-1L] <- rep(0.001,n_edge-1)
v_r$edge.length[1L] <- 0.01

I then calculate the substitution tree with my simulated time and rate trees, with the desired number of sites. I apply treedater to this substitution tree, using the root-to-tip distances for the sampling times, to obtain the inferred rate and time trees. I perform this operation 100 times.

for (i in seq_len(100)) {
  
  v_p <- v_t
  ## rate x time x sites = expected number of substitutions
  el <- v_r$edge.length * v_t$edge.length * nsites
  v_p$edge.length <- rpois(n=n_edge,lambda=el)/nsites
  
  ## Date-name matrix
  times <- diag(vcv(v_t)) # Getting root-tip distance for all tree tips
  names <- v_t$tip.label
  sts <- setNames(times, names)
  
  ## Date and obtain ratogram for substitution rate across branches
  v_est <- dater(v_p,sts=sts,clock='uncorrelated',s=nsites,ncpu=1) 
}

This results in the following error message in some simulations:

Error in if (pp < 0.025) { : missing value where TRUE/FALSE needed
In addition: Warning message:
In pnbinom(pmax(0, round(intree$edge.length * s)), size = r, prob = 1 -  :
  NaNs produced

Notes from preliminary investigations on my end: it seems like the pnbinom term returns NaN because intree$edge.length is a vector filled with NaN. pp is also NaN, as are p and td. I believe the size parameter r is infinity here, indicating the negative binomial converges to a poisson; I am unsure if this case is unaccounted for, and also wonder if this observation has anything to do with the shape of the likelihood landscape.

## Information on the R environment of the run
Sys.info()

sessionInfo()
@emvolz emvolz added the bug label Sep 10, 2024
@emvolz
Copy link
Owner

emvolz commented Sep 10, 2024

I have reproduced this and it is a bug. I agree that this is probably due to very low rate variation and divergence of the rate variation parameter. While I investigate further, there is a workaround: You can use the 'additive' clock model which will also give you an estimate of rate variation but seems to not have this bug.

@avish96
Copy link
Author

avish96 commented Sep 18, 2024

Sounds good, thanks so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants