-
-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Survival models #280
Comments
Thanks @sambrilleman for setting this up. At this stage I am trying to make sense of your code / approach. So some questions follow every now and then. Once this is clarified I will move on an try to contribute :) You mention that for the
as well as
I see how one can get a closed form solution if the splines work directly on the time scale, but if you use the log time scale wouldn't you need to evaluate an integral with an integrand that is the product of a weighted sum of each of the M-splines and exp(integration-variable), with lower integration bound being -\infty and upper one being \log{t}. I am not aware of a solution like this for M-splines... Maybe I am overseeing something here |
Glad you both are working on this! |
@ermeel86 Sounds good!
Hmm, maybe you are right (and it is safer to just stick with t, rather than log(t)). But I had the impression (without writing down the math) that transforming the time scale would be somewhat irrelevant from the perspective of integrating the basis terms. The I-splines are still just evaluated as the integral of the M-splines, but with respect to whatever transformation of time you choose. So, if you choose log(time) as the time scale for the splines, then you are modelling h(log(t)) using M-splines, and you are modelling H(log(t)) using I-splines... (But I may well be missing something here...!) |
@bgoodri @jgabry I'm hoping you can maybe help me with some methodology and implementation. For I am formulating the time-varying coefficient (say beta_p(t)) as a B-spline function, as shown in Equation 4 of the attached PDF. But I want to penalise the spline coefficients to avoid overfitting. I see that in But since I am using B-splines (whereas I think Following Lang and Brezger, I think I can do this via the prior distribution defined in Equation 5 of the attached PDF. If I am correct, that prior distribution for the B-spline coefficients basically equates to a multivariate normal distribution with precision matrix equal to the r-th difference penalty matrix scaled by the smoothing SD. The smoothing SD is then given some hyper-prior as chosen by the user (e.g. exponential, half-normal, half-t or half-Cauchy -- i.e. the equivalent of My question is: I am looking for advice on the implementation of the MVN prior for the B-spline coefficients. You can see my current Stan code here. It currently shows a univariate normal prior for each of the B-spline coefficients, and a hyperprior for the smoothing SD (i.e. the same as used in Any chance you have advice/direction/anything to add on this? If not, a more general question might be "is it possible to apply Matt's trick to a MVN distribution parameterised in terms of it's precision matrix?" Thanks in advance! |
The short answer is that with `stan_gamm4`, the design matrix for the
smooth function has been transformed by a SVD so that its columns are
orthogonal and hence the independent normal priors on the coefficients are
arguably appropriate. This involves the `diagonalize = TRUE` argument to
mgcv::jagam
https://github.com/stan-dev/rstanarm/blob/master/R/stan_gamm4.R#L182
…On Mon, Oct 8, 2018 at 3:20 AM Sam Brilleman ***@***.***> wrote:
@bgoodri <https://github.com/bgoodri> @jgabry <https://github.com/jgabry>
I'm hoping you can maybe help me with some methodology and implementation.
For stan_surv I am formulating a regression model defined on the hazard
scale, that allows for either time-fixed or time-varying coefficients. In
survival analysis, the time-varying coefficients are often referred to as
"non-proportional hazards", since the effect of the covariate on the hazard
is allowed to vary over time.
I am formulating the time-varying coefficient (say beta_p(t)) as a
B-spline function, as shown in Equation 4 of the attached PDF. But I want
to penalise the spline coefficients to avoid overfitting. I see that in
stan_gamm you use a hierarchical smoothing prior for the spline
coefficients but they are not penalised, rather, the coefficients are
treated as independent of one another (i.e. each spline coefficient has a
univariate normal prior).
But since I am using B-splines (whereas I think stan_gamm uses thin plate
regression splines) I want to introduce a penalty matrix.
Following Lang and Brezger <https://www.jstor.org/stable/1391151>, I
think I can do this via the prior distribution defined in Equation 5 of the
attached PDF. If I am correct, that prior distribution for the B-spline
coefficients basically equates to a multivariate normal distribution with
precision matrix equal to the r-th difference penalty matrix scaled by the
smoothing SD. The smoothing SD is then given some hyper-prior as chosen by
the user (e.g. exponential, half-normal, half-t or half-Cauchy -- i.e. the
equivalent of prior_smooth in stan_gamm).
*My question is:* I am looking for advice on the implementation of the
MVN prior for the B-spline coefficients. You can see my current Stan code
here
<https://github.com/stan-dev/rstanarm/blob/feature-interval-censored-jm/src/stan_files/surv.stan#L248>.
It currently shows a univariate normal prior for each of the B-spline
coefficients, and a hyperprior for the smoothing SD (i.e. the same as used
in stan_gamm). But, the commented out lines (i.e. here
<https://github.com/stan-dev/rstanarm/blob/feature-interval-censored-jm/src/stan_files/surv.stan#L251>)
show my initial attempt at a MVN prior with penalty matrix for the
coefficients. But that code is probably wrong, and in any case I want to be
able to reparameterise the MVN prior using Matt's trick, but don't know how
to do that for a precision-based MVN distribution.
Any chance you have advice/direction/anything to add on this? If not, a
more general question might be *"is it possible to apply Matt's trick to
a MVN distribution parameterised in terms of it's precision matrix?"*
Thanks in advance!
notes_for_Ben_and_Jonah.pdf
<https://github.com/stan-dev/rstanarm/files/2454965/notes_for_Ben_and_Jonah.pdf>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#280 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADOrqr0zZm_vhBSMQzsoOQHcx9Z-m--bks5uivzUgaJpZM4T7CES>
.
|
@bgoodri Thanks, that makes sense. But just to elaborate, suppose I wish to specify the coefficients with MVN prior with penalty matrix, is it possible to specify a non-centered parameterisation for that? The Stan manual shows a MVN non-centered reparameterisation as:
but I think what I need is a non-centered reparameterisation of |
The stochastic representation of the multivariate normal is in terms of the
Cholesky factor of the covariance matrix. I guess you could do something
with the inverse of a Cholesky factor of a precision matrix, but that would
require that you solve a potentially large triangular system at every
leapfrog step.
…On Mon, Oct 8, 2018 at 7:53 PM Sam Brilleman ***@***.***> wrote:
@bgoodri <https://github.com/bgoodri> Thanks, that makes sense.
But just to elaborate, suppose I wish to specify the coefficients with MVN
prior with penalty matrix, is it possible to specify a non-centered
parameterisation for that?
The Stan manual shows a MVN non-centered reparameterisation as:
data {
int<lower=2> K;
vector[K] mu;
cov_matrix[K] Sigma;
...
transformed data {
matrix[K, K] L;
L = cholesky_decompose(Sigma);
}
parameters {
vector[K] alpha;
...
transformed parameters {
vector[K] beta;
beta = mu + L * alpha;
}
model {
alpha ~ normal(0, 1);
// implies: beta ~ multi_normal(mu, Sigma)
...
but I think what I need is a non-centered reparameterisation of beta ~
multi_normal_prec(mu, smooth_sd * Tau), not beta ~ multi_normal(mu, Sigma).
In theory, is that simple to specify?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#280 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADOrqoAig50xggz5gCgcA4ihVR-Ibo-Jks5ui-VwgaJpZM4T7CES>
.
|
Hi, could we do a quick update of all the milestones that are currently accomplished? I'd be able to add plot method and posterior_survfit method if they are not ticked yet. |
@sambrilleman good idea to use a R.W. priors for the spline coefficients to model the time dependent effects. In general I think we should also offer / implement a R.W. prior for the m-/i-splines in the baseline hazard/cumulative baseline hazard (better a r.w. prior for |
@ermeel86 Thanks, I would love for your advice on this! I had tried to implement the prior distribution as proposed in Lang and Brezger's "Bayesian P-splines" paper, but got completely confused as to the appropriate Stan implementation. But am I correct in thinking that it is somewhat equivalent to a RW prior? (I've not really used RW priors before) If a RW prior for the B-spline coefficients is the way to go, then what are your thoughts on this document? In the last section entitled "Location of Knots and the Choice of Priors" Milad discusses smoothing B-splines via a RW prior and the implementation seems to be much more straightforward than what I was attempting when messing around with the Lang and Brezger approach. The code that Milad provides is basically
where the vector of B-spline coefficients is And as you mention, a random walk for |
@csetraynor Thanks, I think I've pretty much sorted a To make matters worse, I've muddled branches a bit, because I wanted to incorporate interval censoring in both |
See this (adapted from "Bayesian Regression modelling with INLA" by Wang et. al): This applies for the lag-1 case. I think that one can recover Eq. 5 from the Lang and Brezger paper by taking the limit
We could also add a hyper prior for |
Most of the time (i.e. all of the time as far as rstanarm is concerned) you
would want to do the multivariate version of this rather than the marginal
& conditionals version. But you would want to write a specialized function
that does B' * Q * B for a tridiagonal Q and computes the log-determinant
of a tridiagonal Q. Alternatively, you can post-multiply the inverse of the
Cholesky factor of that Q by a row-vector of coefficients to non-center the
betas.
…On Sun, Oct 14, 2018 at 2:19 PM ermeel ***@***.***> wrote:
I had tried to implement the prior distribution as proposed in Lang and
Brezger's "Bayesian P-splines" paper, but got completely confused as to the
appropriate Stan implementation. But am I correct in thinking that it is
somewhat equivalent to a RW prior? (I've not really used RW priors before)
See the this (adapted from "Bayesian Regression modelling with INLA" by
Wang et. al):
[image: ar_multivariate_normals]
<https://user-images.githubusercontent.com/5255975/46920373-b1ae7780-cfed-11e8-9df5-e3aeccc590ed.png>
This applies for the lag-1 case. I think that one can recover Eq. 5 from
the Lang and Brezger paper by taking the limit $\rho\rightarrow 1$. This
would then also explain why they say
the prior (5) is improper.
We could also add a hyper prior for $\rho$ over $[-1,1]$. So from an
analytic perspective I think the two approaches are equivalent,
computationally I am not sure. Maybe @bgoodri <https://github.com/bgoodri>
can help out here?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#280 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADOrqk9fDLJsQMDUua4SK-C-AeyERYOtks5uk4A2gaJpZM4T7CES>
.
|
@sambrilleman couldn’t we use the methods described in algorithm 2 and 3 of |
It isn't so complicated: If you can construct the precision matrix, you can
use the inverse of its Cholesky factor in an uncentered parameterization of
the coefficients. A similar thing came up in
https://andrewgelman.com/2016/09/02/two-weird-tricks-for-fast-conditional-autoregressive-models-in-stan/#comment-302825
…On Mon, Oct 15, 2018 at 4:30 PM ermeel ***@***.***> wrote:
but I think what I need is a non-centered reparameterisation of beta ~
multi_normal_prec(mu, smooth_sd * Tau), not beta ~ multi_normal(mu, Sigma).
In theory, is that simple to specify?
@sambrilleman <https://github.com/sambrilleman> couldn’t we use the
methods described in algorithm 2 and 3 of
https://arxiv.org/abs/1801.03791 here? Especially the sparse back
substitution algorithm mentioned there?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#280 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADOrqu5dn4sEDxR6HRvzmUi2XhBjGaCDks5ulPBkgaJpZM4T7CES>
.
|
There you say
So if this is the case. The sparse back substitution algorithm from the above reference is asymptotically computationally more efficient, since it runs in linear time. However, i think you have to do the inverse calculation only once and a linear matrix multiplication at every leap frog step, but the above approach I refer to would do only a linear operation every leap frog step, right? |
Yeah, you can write your own code to do the bidiagonal solve.
…On Mon, Oct 15, 2018 at 4:43 PM ermeel ***@***.***> wrote:
There you say
However, this triangular solve is an O(N^3) operation, so it is slow for
large N (which could easily be much larger than in the Scottish Lip Cancer
example).
So if this is the case. The sparse back substitution algorithm from the
above reference is asymptotically computationally more efficient, since it
runs in linear time. However, i think you have to do the inverse
calculation only once and a linear matrix multiplication at every leap frog
step, but the above approach I refer to would do only a linear operation
every leap frog step, right?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#280 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADOrqo80TMmaJtylya4bocN4StRmxIVyks5ulPN9gaJpZM4T7CES>
.
|
@ermeel86 @bgoodri Just to follow up on this. I've opened a pull request with a first working version of |
Hi All, I keep getting errors such as when the grouping variable is a factor variable with three levels. When I change the group variable to a numeric I get this error message: The model is not complex:
But it seems to work fine if I try and run it as a stan_glm ignoring the time variable:
and the stan_surv work fine when I don't include the hierarchical parameter. Just wondering if I am doing anything wrong here that I can work on. Thank you all in advance! |
Hi Karl I've not yet implemented hierarchical parameters for Hierarchical survival models will happen eventually. But I ideally wanted to get a version of |
Ahhhhh ok no problem! It is already really very good! Thank you. Is there any chance you know of any workarounds I may use in the meantime? Again though, thank you! |
@karlsmithbyrne If you want to go the Bayesian route with a proportional hazards model, then I'm not sure there are many alternatives. I think brms currently fits exponential and Weibull accelerated failure time (AFT) models with groups-specific parameters. I see that you were trying to use an exponential baseline hazard with There may be other packages I'm not aware of that fit Bayesian PH models with random effects -- you could check the CRAN task view. In any case, with only three levels for your grouping factor
Although I'm not familiar with the research question etc. |
Hi guys. Apologies if this is not the correct place to ask this, but I thought I would post about the survival models in this open Issue rather than starting a new one. Firstly, would like to say the preprint on arXiv "Bayesian Survival Analysis Using the rstanarm R Package" is excellent. I would recommend possibly putting some code up on how to install the specific feature branch though, as I don't think its intuitive how to download that version of the GitHub rstanarm package (get the stan_surv() functions) from your preprint ( Secondly, I am implementing a random effects model and am trying to create population-level predictions (i.e. averaging over the random effects but having different predictions for each group-level effect). In this vignette, it seems that I want to set When I supply levels of the random factor that are already present, the code works and I think having Just checking that (a) my interpretation is correct and (b) highlighting that you cannot specific new levels in this branch atm when using Many thanks and super impressed with the models and the vignettes and the preprint. |
Hey @padpadpadpad... so sorry it has taken me this long to get back to you!!
That is a great idea, I will aim to do that soon. However, the more up to date branch is in fact
That vignette is specific to a certain class of survival models -- joint models for longitudinal and survival data. The
From memory, you should be able to provide new levels. If the
The example should show that, as expected, If that example doesn't work for you or doesn't help clarify, then perhaps open a separate Github issue and I can try debug the issue for you! |
Hey @sambrilleman thank you so much for your response. I have installed from the survival branch now but cannot get the man pages to install (see this issue. This means I cannot look at I did manage to get the code working though so that's great. In ps3, we are predicting for a new level of the random effect and marginalising over the random effect, but is there also a way to not include the random effects in the prediction and instead use only the fixed/treatment effects (akin to Many thanks |
hmm, sorry I don't think I implemented that! :-( I guess what you are suggesting is akin to a new group/cluster/factor level for the random effect, but you know with absolute certainty that their random effect value is 0. So I'd almost argue that such a prediction isn't a realistic acknowledgement of the uncertainty in the model...? But maybe I am wrong... |
Hi Sam These are valid points for the prediction of new individuals. I took the approach of standardising over all individuals within each treatment. But it meant I had to do a separate call of This approach seems to work for what I want. Thank you for responding and being so prompt with your responses. |
Sweet, glad you found a solution that seems to get you what you need! Yeah, you can even do causal inference type things where you standardise over all the individuals (i.e. from both treatments) first assuming they have treatment fixed to 0 and then treatment fixed to 1, if that's what you want, so for e.g. something like:
but perhaps reducing Or whatever makes sense for the type of comparisons you want to make! |
Hi all and thanks for making survival analysis with rstanarm a reality! I encountered a small bug when working through and extending the examples in the paper. In section 6.6. we fit a simple multilevel model. Elsewhere in the paper it is mentioned that mod_randint <-
rstanarm::stan_surv(
formula = Surv(eventtime, status) ~ trt + (1 | site),
data = frail,
basehaz = "exp"
)
rstanarm::ranef(mod_randint)
Error: This method is for stan_glmer and stan_lmer models only. I am using linux, and rstanarm survival branch (version 2.19.1). EDIT: What is btw the preferred way to post issues related to the survival functionality in rstanarm? This issue or a separate issue, or something else? And while I'm at it, do you have any estimate on when you could be merging the feature/survival branch and getting the functionality to CRAN? |
Hi @ouzor. Sorry for the slow reply. Coincidental timing actually! This same issue also recently got raised by someone else. See the issue here. My suggestion there is just to get the random effects from You are correct about where the issue lies. Hopefully I can just add For future reference -- yes, it is probably better to open a separate issue for bugs like this. It is easier to track in separate issues. I'm still not sure when the survival branch can get pushed to CRAN. Would have to check with @bgoodri. Cheers! |
Thanks for the reply @sambrilleman, using |
Yes, although stan_surv() it is not complete, it would be nice to have a minimal version of this feature/survival branch merged into master, so that users can start using it |
Indeed that would be nice! Unfortunately it uses too much RAM during compilation to get it on CRAN at the moment (at least I think that's the holdup). At one point it seemed like there was a plan to get the memory usage down but I'm not sure where that stands. @bgoodri ? In the meantime, we're trying to see if we can build pre-compiled binaries that we can distribute so people can install it easily. Currently it seems like that is successful on Mac but not yet on Windows: stan-dev/r-packages#2 (comment) |
And what about compiling survival models at runtime, or first use, by using Rcpp::sourceCpp()? A It is the approach brms uses as indicated in its paper: |
This is really an issue. I have about 20 colleagues who would like to try stan_surv (after a seminar I gave), but installing fails for most (mostly Windows). I personally wasn’t able to compile it myself on Windows neither could I prepare a binary package for Windows. |
Same issue occurred to me! I tried compilation on Windows, and was not able
with my 8GB RAM Laptop
… |
Also, I just followed live memory usage on my computer, and it did not reach the memory limit (I have 32GB) but compilation failed because it could not allocate 40099448 bytes, which is about 4MB... |
This is the size that goes over the allocatable memory. Not the totally
used memory itself.
EDIT: for clarity, replace the second sentence with: This size does not represent the totally used memory itself.
Eren M. Elçi <[email protected]> schrieb am Do., 2. Juli 2020, 08:22:
… Also, I just followed live memory usage on my computer, and it did not
reach the memory limit (I have 32GB) but compilation failed because it
could not allocate 40099448 bytes, which is about 4MB...
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#280 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADCW2AGCWEQB5SKPPRHTO23RZQRUFANCNFSM4E7MEEJA>
.
|
So this means it is not really a memory problem per se, i.e. people having not enough memory, but another compile / C++ issue? Is it clear what the underlying mechanism is? |
To clarify my above post: When your R can allocate, say, 16 GB of your memory and your end up not being able to allocate 4MB as per the above message, this means that R tried to allocate 16 GB + 4 MB in total at which point it failed. |
@ermeel86 Yeah, it is quite a disaster. I personally cannot get the build working either. I have tried both R 4.0.2 and R 3.6.2 (which was the last version I was using when I was actively working on this stuff), but have had no success with either. One additional problem is that I haven't been actively working on this branch for 6 months or more, so I'm not even sure at which point the build started to break (e.g. R 4.x, or a Windows update, or something else). I know very little about these types of compiler issues, or even how the compilation infrastructure for rstanarm works in detail. So I am as stuck as most of the users. One thought I had was attempting to update the |
@bgoodri mentioned at the Stan meeting today that adding It was also discussed in the Stan meeting today that we can probably fix this by using standalone generated quantities for all of our Stan programs in rstanarm. @bgoodri said he will start working on this soon. |
Is there a way to increase the memory for R for this purpose? I am asking because I checked the overall system memory consumption on my computer during compilation and there were about 18 GB left (unless this was a super fast jump which wasn’t visible graphically). |
Thanks @jgabry this is really appreciated. I just tried your compile flag suggestion, and it printed a warning that the |
The memory.limit() function can be used to control the amount of memory
available to R.
Am Fr., 3. Juli 2020 um 06:55 Uhr schrieb Eren M. Elçi <
[email protected]>:
… @bgoodri <https://github.com/bgoodri> mentioned at the Stan meeting today
that adding "--no-multiarch" might help on Windows. That can go in the
build_opts argument to devtools::install_github() or the args argument to
devtools::build() (if you're building locally). Can someone with a
Windows machine try that out? (I don't have one).
It was also discussed in the Stan meeting today that we can probably fix
this by using standalone generated quantities for all of our Stan programs
in rstanarm. @bgoodri <https://github.com/bgoodri> said he will start
working on this soon.
Thanks @jgabry <https://github.com/jgabry> this is really appreciated. I
will try your compile flag suggestion.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#280 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADCW2ACPKYEZHZYMMCCMZYDRZVQCVANCNFSM4E7MEEJA>
.
|
If I interpret the output of that function on my machine correctly, then it is approximately 32GB. Now I don’t understand how your previous comment can explain the memory error: R failed to get the additional 4MB, when I had at least 16 GB left during the test and my machine has in total 32 GB, which roughly coincides with the limit... |
I'm opening this issue so there is a forum in which to discuss progress on getting survival models into rstanarm.
I've started a new survival branch here. It currently has a
stan_surv
modelling function, calling asurv.stan
model file. The modelling function returns an object of classstansurv
that inheritsstanreg
.The current task list looks something like:
print
method for stansurvsummary
method for stansurvprior_summary
method for stansurvbasehaz_summary
method for stansurvposterior_survfit
method for stansurvlog_lik
,loo
, andwaic
method for stansurvplot
method that includes plot type for baseline hazardstan_jm
working withstan_surv
tt
approach, or something else (e.g. atde()
special function for use in the formula)The text was updated successfully, but these errors were encountered: