Updates to stats vignette #406

wolbersm · 2024-01-18T10:46:28Z

I have updated the stats vignette (revised section on standard errors and added links to upcoming new vignette) and modified the references bib accordingly. Could you please have a look and if you are happy approve the changes.

Thanks,
Marcel

Added Lu2021 and updated two other references. Signed-off-by: wolbersm <[email protected]>

Updated section on Standard errors of the treatment effect Signed-off-by: wolbersm <[email protected]>

vignettes/stat_specs.Rmd

Yes, sorry for the blunder and thanks for spotting this!! Co-authored-by: Alessandro Noci <[email protected]> Signed-off-by: wolbersm <[email protected]>

vignettes/stat_specs.Rmd

nociale

Just made few comments, looks good! :)

Co-authored-by: Alessandro Noci <[email protected]> Signed-off-by: wolbersm <[email protected]>

Yes, good suggestion. Please make sure that this is also adapted in the actual new vignette. Co-authored-by: Alessandro Noci <[email protected]> Signed-off-by: wolbersm <[email protected]>

gowerc · 2024-01-18T12:28:15Z

Hi both, just to say please commit the updated html to the repo as the vignettes are not rebuilt at package installation time but are taken "as-is".

EDIT - Also please update the news file to given a brief summary about what was changed with this update :)

nociale · 2024-01-18T16:43:59Z

@gowerc I have added the updated html. In addition to this, I have included a new vignette (called CondMean_Inference). I have included it by doing the following:

Add the rmd and html files
Add the html.asis file
Update the build.R file
Update news file

Could you please check that this is done correctly?

In addition, it would be great if you could review the vignette and its code.

Thanks!

nociale · 2024-01-18T16:46:42Z

PS: this closes #403

vignettes/references.bib

NEWS.md

vignettes/CondMean_Inference.html.asis

vignettes/stat_specs.Rmd

vignettes/CondMean_Inference.Rmd

gowerc · 2024-01-19T15:33:27Z

vignettes/CondMean_Inference.Rmd

+As described in section 3.10.2 of the statistical specifications of the package (`vignette(topic = "stat_specs", package = "rbmi")`), two different types of variance estimators have been proposed for reference-based imputation methods in the statistical literature (@Bartlett2021). The first is the frequentist variance which describes the actual repeated sampling variability of the estimator and results in inference which is correct in the frequentist sense, i.e. hypothesis tests have accurate type I error control and confidence intervals have correct coverage probabilities under repeated sampling if the reference-based assumption is correctly specified (@Bartlett2021, @Wolbers2021). Reference-based missing data assumption are strong and borrow information from the control arm for imputation in the active arm. As a consequence, the size of frequentist standard errors for treatment effects may decrease with increasing amounts of missing data. The second is the so-called "information-anchored" variance which was originally proposed in the context of sensitivity analyses (@CroEtAl2019). This variance estimator is based on disentangling point estimation and variance estimation altogether. The resulting information-anchored variance is typically very similar to the variance under missing-at-random (MAR) imputation and increases with increasing amounts of missing data at approximately the same rate as MAR imputation. However, the information-anchored variance does not reflect the actual variability of the reference-based estimator and the resulting frequentist inference is highly conservative resulting in a substantial power loss. 
+
+Reference-based conditional mean imputation combined with a resampling method such as the jackknife or the bootstrap was first introduced in @Wolbers2021. This approach naturally targets the frequentist variance. The information-anchored variance is typically estimated using Rubin's rules for Bayesian multiple imputation which are not applicable within the conditional mean imputation framework. However, an alternative information-anchored variance proposed by @Lu2021 can easily be obtained as we show below. The basic idea of @Lu2021 is to obtain the information-anchored variance via a MAR imputation combined with a delta-adjustment where delta is selected in a data-driven way to match the reference-based estimator. For conditional mean imputation, the proposal by @Lu2021 can be implemented by choosing the delta-adjustment as the difference between the conditional mean imputation under the chosen reference-based assumption and MAR on the original dataset. The variance can then be obtained via the jackknife or the bootstrap while keeping the delta-adjustment fixed. The resulting variance estimate is very similar to Rubin's variance. Moreover as shown in @CroEtAl2019, the variance of MAR-imputation combined with a delta-adjustment achieves even better information-anchoring properties than Rubin's variance for reference-based imputation.
+


For the sake of my own understand can I ask why this is ?

My guess is that if you have lots of data in the control arm then all missing data essentially gets filled in by the mean (well the mean conditioned on covariates) thus the more data you have the more observations are imputed at the mean so there is less variability in the data you are analysising. Is that roughly right ?

gowerc · 2024-01-19T15:35:32Z

vignettes/CondMean_Inference.Rmd

+As described in section 3.10.2 of the statistical specifications of the package (`vignette(topic = "stat_specs", package = "rbmi")`), two different types of variance estimators have been proposed for reference-based imputation methods in the statistical literature (@Bartlett2021). The first is the frequentist variance which describes the actual repeated sampling variability of the estimator and results in inference which is correct in the frequentist sense, i.e. hypothesis tests have accurate type I error control and confidence intervals have correct coverage probabilities under repeated sampling if the reference-based assumption is correctly specified (@Bartlett2021, @Wolbers2021). Reference-based missing data assumption are strong and borrow information from the control arm for imputation in the active arm. As a consequence, the size of frequentist standard errors for treatment effects may decrease with increasing amounts of missing data. The second is the so-called "information-anchored" variance which was originally proposed in the context of sensitivity analyses (@CroEtAl2019). This variance estimator is based on disentangling point estimation and variance estimation altogether. The resulting information-anchored variance is typically very similar to the variance under missing-at-random (MAR) imputation and increases with increasing amounts of missing data at approximately the same rate as MAR imputation. However, the information-anchored variance does not reflect the actual variability of the reference-based estimator and the resulting frequentist inference is highly conservative resulting in a substantial power loss. 
+
+Reference-based conditional mean imputation combined with a resampling method such as the jackknife or the bootstrap was first introduced in @Wolbers2021. This approach naturally targets the frequentist variance. The information-anchored variance is typically estimated using Rubin's rules for Bayesian multiple imputation which are not applicable within the conditional mean imputation framework. However, an alternative information-anchored variance proposed by @Lu2021 can easily be obtained as we show below. The basic idea of @Lu2021 is to obtain the information-anchored variance via a MAR imputation combined with a delta-adjustment where delta is selected in a data-driven way to match the reference-based estimator. For conditional mean imputation, the proposal by @Lu2021 can be implemented by choosing the delta-adjustment as the difference between the conditional mean imputation under the chosen reference-based assumption and MAR on the original dataset. The variance can then be obtained via the jackknife or the bootstrap while keeping the delta-adjustment fixed. The resulting variance estimate is very similar to Rubin's variance. Moreover as shown in @CroEtAl2019, the variance of MAR-imputation combined with a delta-adjustment achieves even better information-anchoring properties than Rubin's variance for reference-based imputation.
+


Suggested change

However, an alternative information-anchored variance proposed by @Lu2021 can easily be obtained as shown below.

The basic idea of @Lu2021 is to obtain the information-anchored variance via a MAR imputation combined with a delta-adjustment where delta is selected in a data-driven way to match the reference-based estimator.

gowerc · 2024-01-19T15:39:37Z

vignettes/CondMean_Inference.Rmd

+As described in section 3.10.2 of the statistical specifications of the package (`vignette(topic = "stat_specs", package = "rbmi")`), two different types of variance estimators have been proposed for reference-based imputation methods in the statistical literature (@Bartlett2021). The first is the frequentist variance which describes the actual repeated sampling variability of the estimator and results in inference which is correct in the frequentist sense, i.e. hypothesis tests have accurate type I error control and confidence intervals have correct coverage probabilities under repeated sampling if the reference-based assumption is correctly specified (@Bartlett2021, @Wolbers2021). Reference-based missing data assumption are strong and borrow information from the control arm for imputation in the active arm. As a consequence, the size of frequentist standard errors for treatment effects may decrease with increasing amounts of missing data. The second is the so-called "information-anchored" variance which was originally proposed in the context of sensitivity analyses (@CroEtAl2019). This variance estimator is based on disentangling point estimation and variance estimation altogether. The resulting information-anchored variance is typically very similar to the variance under missing-at-random (MAR) imputation and increases with increasing amounts of missing data at approximately the same rate as MAR imputation. However, the information-anchored variance does not reflect the actual variability of the reference-based estimator and the resulting frequentist inference is highly conservative resulting in a substantial power loss. 
+
+Reference-based conditional mean imputation combined with a resampling method such as the jackknife or the bootstrap was first introduced in @Wolbers2021. This approach naturally targets the frequentist variance. The information-anchored variance is typically estimated using Rubin's rules for Bayesian multiple imputation which are not applicable within the conditional mean imputation framework. However, an alternative information-anchored variance proposed by @Lu2021 can easily be obtained as we show below. The basic idea of @Lu2021 is to obtain the information-anchored variance via a MAR imputation combined with a delta-adjustment where delta is selected in a data-driven way to match the reference-based estimator. For conditional mean imputation, the proposal by @Lu2021 can be implemented by choosing the delta-adjustment as the difference between the conditional mean imputation under the chosen reference-based assumption and MAR on the original dataset. The variance can then be obtained via the jackknife or the bootstrap while keeping the delta-adjustment fixed. The resulting variance estimate is very similar to Rubin's variance. Moreover as shown in @CroEtAl2019, the variance of MAR-imputation combined with a delta-adjustment achieves even better information-anchoring properties than Rubin's variance for reference-based imputation.
+


Apologies I'm not sure I understand this. Based on this description my understanding is that your per patient estimates are the MAR estimate + (Reference estimate - MAR estimate).

Doesn't this just give you the Reference estimates as your imputed values? So why would the variances be different (apologies I appreciate this is likely a stupid question)

gowerc · 2024-01-19T15:54:36Z

vignettes/CondMean_Inference.Rmd

+As described in section 3.10.2 of the statistical specifications of the package (`vignette(topic = "stat_specs", package = "rbmi")`), two different types of variance estimators have been proposed for reference-based imputation methods in the statistical literature (@Bartlett2021). The first is the frequentist variance which describes the actual repeated sampling variability of the estimator and results in inference which is correct in the frequentist sense, i.e. hypothesis tests have accurate type I error control and confidence intervals have correct coverage probabilities under repeated sampling if the reference-based assumption is correctly specified (@Bartlett2021, @Wolbers2021). Reference-based missing data assumption are strong and borrow information from the control arm for imputation in the active arm. As a consequence, the size of frequentist standard errors for treatment effects may decrease with increasing amounts of missing data. The second is the so-called "information-anchored" variance which was originally proposed in the context of sensitivity analyses (@CroEtAl2019). This variance estimator is based on disentangling point estimation and variance estimation altogether. The resulting information-anchored variance is typically very similar to the variance under missing-at-random (MAR) imputation and increases with increasing amounts of missing data at approximately the same rate as MAR imputation. However, the information-anchored variance does not reflect the actual variability of the reference-based estimator and the resulting frequentist inference is highly conservative resulting in a substantial power loss. 
+
+Reference-based conditional mean imputation combined with a resampling method such as the jackknife or the bootstrap was first introduced in @Wolbers2021. This approach naturally targets the frequentist variance. The information-anchored variance is typically estimated using Rubin's rules for Bayesian multiple imputation which are not applicable within the conditional mean imputation framework. However, an alternative information-anchored variance proposed by @Lu2021 can easily be obtained as we show below. The basic idea of @Lu2021 is to obtain the information-anchored variance via a MAR imputation combined with a delta-adjustment where delta is selected in a data-driven way to match the reference-based estimator. For conditional mean imputation, the proposal by @Lu2021 can be implemented by choosing the delta-adjustment as the difference between the conditional mean imputation under the chosen reference-based assumption and MAR on the original dataset. The variance can then be obtained via the jackknife or the bootstrap while keeping the delta-adjustment fixed. The resulting variance estimate is very similar to Rubin's variance. Moreover as shown in @CroEtAl2019, the variance of MAR-imputation combined with a delta-adjustment achieves even better information-anchoring properties than Rubin's variance for reference-based imputation.
+


Suggested change

Moreover, as shown in @CroEtAl2019, the variance of MAR-imputation combined with a delta-adjustment achieves even better information-anchoring properties than Rubin's variance for reference-based imputation.

vignettes/CondMean_Inference.Rmd

Co-authored-by: Craig Gower-Page <[email protected]> Signed-off-by: wolbersm <[email protected]>

Co-authored-by: Craig Gower-Page <[email protected]> Signed-off-by: Alessandro Noci <[email protected]>

nociale · 2024-01-24T12:23:57Z

@gowerc I have tried to re-write the function using only base R (i.e. avoid dplyr functions) and I have updated the vignette title. I have included most of your comments but there are two remaining suggestions from you that I cannot commit because they are "outdated" (however, I agree with them).

Could you please have a final review and make the final changes? Thank you!

wolbersm added 2 commits January 18, 2024 11:35

Update references.bib

fa476cf

Added Lu2021 and updated two other references. Signed-off-by: wolbersm <[email protected]>

Update stat_specs.Rmd

5cb02e1

Updated section on Standard errors of the treatment effect Signed-off-by: wolbersm <[email protected]>

wolbersm requested a review from nociale January 18, 2024 10:46

nociale reviewed Jan 18, 2024

View reviewed changes

vignettes/stat_specs.Rmd Outdated Show resolved Hide resolved

Update vignettes/stat_specs.Rmd

0082018

Yes, sorry for the blunder and thanks for spotting this!! Co-authored-by: Alessandro Noci <[email protected]> Signed-off-by: wolbersm <[email protected]>

nociale reviewed Jan 18, 2024

View reviewed changes

vignettes/stat_specs.Rmd Outdated Show resolved Hide resolved

nociale reviewed Jan 18, 2024

View reviewed changes

vignettes/stat_specs.Rmd Outdated Show resolved Hide resolved

nociale approved these changes Jan 18, 2024

View reviewed changes

wolbersm and others added 2 commits January 18, 2024 12:25

Update vignettes/stat_specs.Rmd

0fa455f

Co-authored-by: Alessandro Noci <[email protected]> Signed-off-by: wolbersm <[email protected]>

Update vignettes/stat_specs.Rmd

3ed1f55

Yes, good suggestion. Please make sure that this is also adapted in the actual new vignette. Co-authored-by: Alessandro Noci <[email protected]> Signed-off-by: wolbersm <[email protected]>

little change to stats_specs and include CondMean_Inference new vignette

d55461b

nociale requested a review from gowerc January 18, 2024 16:45

update news

33b412e