Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to stats vignette #406

Merged
merged 17 commits into from
Jan 25, 2024
Merged

Updates to stats vignette #406

merged 17 commits into from
Jan 25, 2024

Conversation

wolbersm
Copy link
Collaborator

Hi @nociale

I have updated the stats vignette (revised section on standard errors and added links to upcoming new vignette) and modified the references bib accordingly. Could you please have a look and if you are happy approve the changes.

Thanks,
Marcel

Added Lu2021 and updated two other references.

Signed-off-by: wolbersm <[email protected]>
Updated section on  Standard errors of the treatment effect 

Signed-off-by: wolbersm <[email protected]>
@wolbersm wolbersm requested a review from nociale January 18, 2024 10:46
vignettes/stat_specs.Rmd Outdated Show resolved Hide resolved
Yes, sorry for the blunder and thanks for spotting this!!

Co-authored-by: Alessandro Noci <[email protected]>
Signed-off-by: wolbersm <[email protected]>
vignettes/stat_specs.Rmd Outdated Show resolved Hide resolved
vignettes/stat_specs.Rmd Outdated Show resolved Hide resolved
Copy link
Collaborator

@nociale nociale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just made few comments, looks good! :)

wolbersm and others added 2 commits January 18, 2024 12:25
Co-authored-by: Alessandro Noci <[email protected]>
Signed-off-by: wolbersm <[email protected]>
Yes, good suggestion. Please make sure that this is also adapted in the actual new vignette.

Co-authored-by: Alessandro Noci <[email protected]>
Signed-off-by: wolbersm <[email protected]>
@gowerc
Copy link
Collaborator

gowerc commented Jan 18, 2024

Hi both, just to say please commit the updated html to the repo as the vignettes are not rebuilt at package installation time but are taken "as-is".

EDIT - Also please update the news file to given a brief summary about what was changed with this update :)

@nociale
Copy link
Collaborator

nociale commented Jan 18, 2024

@gowerc I have added the updated html. In addition to this, I have included a new vignette (called CondMean_Inference). I have included it by doing the following:

  • Add the rmd and html files
  • Add the html.asis file
  • Update the build.R file
  • Update news file

Could you please check that this is done correctly?

In addition, it would be great if you could review the vignette and its code.

Thanks!

@nociale nociale requested a review from gowerc January 18, 2024 16:45
@nociale
Copy link
Collaborator

nociale commented Jan 18, 2024

PS: this closes #403

NEWS.md Outdated Show resolved Hide resolved
vignettes/stat_specs.Rmd Outdated Show resolved Hide resolved
vignettes/stat_specs.Rmd Outdated Show resolved Hide resolved
As described in section 3.10.2 of the statistical specifications of the package (`vignette(topic = "stat_specs", package = "rbmi")`), two different types of variance estimators have been proposed for reference-based imputation methods in the statistical literature (@Bartlett2021). The first is the frequentist variance which describes the actual repeated sampling variability of the estimator and results in inference which is correct in the frequentist sense, i.e. hypothesis tests have accurate type I error control and confidence intervals have correct coverage probabilities under repeated sampling if the reference-based assumption is correctly specified (@Bartlett2021, @Wolbers2021). Reference-based missing data assumption are strong and borrow information from the control arm for imputation in the active arm. As a consequence, the size of frequentist standard errors for treatment effects may decrease with increasing amounts of missing data. The second is the so-called "information-anchored" variance which was originally proposed in the context of sensitivity analyses (@CroEtAl2019). This variance estimator is based on disentangling point estimation and variance estimation altogether. The resulting information-anchored variance is typically very similar to the variance under missing-at-random (MAR) imputation and increases with increasing amounts of missing data at approximately the same rate as MAR imputation. However, the information-anchored variance does not reflect the actual variability of the reference-based estimator and the resulting frequentist inference is highly conservative resulting in a substantial power loss.

Reference-based conditional mean imputation combined with a resampling method such as the jackknife or the bootstrap was first introduced in @Wolbers2021. This approach naturally targets the frequentist variance. The information-anchored variance is typically estimated using Rubin's rules for Bayesian multiple imputation which are not applicable within the conditional mean imputation framework. However, an alternative information-anchored variance proposed by @Lu2021 can easily be obtained as we show below. The basic idea of @Lu2021 is to obtain the information-anchored variance via a MAR imputation combined with a delta-adjustment where delta is selected in a data-driven way to match the reference-based estimator. For conditional mean imputation, the proposal by @Lu2021 can be implemented by choosing the delta-adjustment as the difference between the conditional mean imputation under the chosen reference-based assumption and MAR on the original dataset. The variance can then be obtained via the jackknife or the bootstrap while keeping the delta-adjustment fixed. The resulting variance estimate is very similar to Rubin's variance. Moreover as shown in @CroEtAl2019, the variance of MAR-imputation combined with a delta-adjustment achieves even better information-anchoring properties than Rubin's variance for reference-based imputation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of my own understand can I ask why this is ?

My guess is that if you have lots of data in the control arm then all missing data essentially gets filled in by the mean (well the mean conditioned on covariates) thus the more data you have the more observations are imputed at the mean so there is less variability in the data you are analysising. Is that roughly right ?

As described in section 3.10.2 of the statistical specifications of the package (`vignette(topic = "stat_specs", package = "rbmi")`), two different types of variance estimators have been proposed for reference-based imputation methods in the statistical literature (@Bartlett2021). The first is the frequentist variance which describes the actual repeated sampling variability of the estimator and results in inference which is correct in the frequentist sense, i.e. hypothesis tests have accurate type I error control and confidence intervals have correct coverage probabilities under repeated sampling if the reference-based assumption is correctly specified (@Bartlett2021, @Wolbers2021). Reference-based missing data assumption are strong and borrow information from the control arm for imputation in the active arm. As a consequence, the size of frequentist standard errors for treatment effects may decrease with increasing amounts of missing data. The second is the so-called "information-anchored" variance which was originally proposed in the context of sensitivity analyses (@CroEtAl2019). This variance estimator is based on disentangling point estimation and variance estimation altogether. The resulting information-anchored variance is typically very similar to the variance under missing-at-random (MAR) imputation and increases with increasing amounts of missing data at approximately the same rate as MAR imputation. However, the information-anchored variance does not reflect the actual variability of the reference-based estimator and the resulting frequentist inference is highly conservative resulting in a substantial power loss.

Reference-based conditional mean imputation combined with a resampling method such as the jackknife or the bootstrap was first introduced in @Wolbers2021. This approach naturally targets the frequentist variance. The information-anchored variance is typically estimated using Rubin's rules for Bayesian multiple imputation which are not applicable within the conditional mean imputation framework. However, an alternative information-anchored variance proposed by @Lu2021 can easily be obtained as we show below. The basic idea of @Lu2021 is to obtain the information-anchored variance via a MAR imputation combined with a delta-adjustment where delta is selected in a data-driven way to match the reference-based estimator. For conditional mean imputation, the proposal by @Lu2021 can be implemented by choosing the delta-adjustment as the difference between the conditional mean imputation under the chosen reference-based assumption and MAR on the original dataset. The variance can then be obtained via the jackknife or the bootstrap while keeping the delta-adjustment fixed. The resulting variance estimate is very similar to Rubin's variance. Moreover as shown in @CroEtAl2019, the variance of MAR-imputation combined with a delta-adjustment achieves even better information-anchoring properties than Rubin's variance for reference-based imputation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
However, an alternative information-anchored variance proposed by @Lu2021 can easily be obtained as shown below.
The basic idea of @Lu2021 is to obtain the information-anchored variance via a MAR imputation combined with a delta-adjustment where delta is selected in a data-driven way to match the reference-based estimator.

As described in section 3.10.2 of the statistical specifications of the package (`vignette(topic = "stat_specs", package = "rbmi")`), two different types of variance estimators have been proposed for reference-based imputation methods in the statistical literature (@Bartlett2021). The first is the frequentist variance which describes the actual repeated sampling variability of the estimator and results in inference which is correct in the frequentist sense, i.e. hypothesis tests have accurate type I error control and confidence intervals have correct coverage probabilities under repeated sampling if the reference-based assumption is correctly specified (@Bartlett2021, @Wolbers2021). Reference-based missing data assumption are strong and borrow information from the control arm for imputation in the active arm. As a consequence, the size of frequentist standard errors for treatment effects may decrease with increasing amounts of missing data. The second is the so-called "information-anchored" variance which was originally proposed in the context of sensitivity analyses (@CroEtAl2019). This variance estimator is based on disentangling point estimation and variance estimation altogether. The resulting information-anchored variance is typically very similar to the variance under missing-at-random (MAR) imputation and increases with increasing amounts of missing data at approximately the same rate as MAR imputation. However, the information-anchored variance does not reflect the actual variability of the reference-based estimator and the resulting frequentist inference is highly conservative resulting in a substantial power loss.

Reference-based conditional mean imputation combined with a resampling method such as the jackknife or the bootstrap was first introduced in @Wolbers2021. This approach naturally targets the frequentist variance. The information-anchored variance is typically estimated using Rubin's rules for Bayesian multiple imputation which are not applicable within the conditional mean imputation framework. However, an alternative information-anchored variance proposed by @Lu2021 can easily be obtained as we show below. The basic idea of @Lu2021 is to obtain the information-anchored variance via a MAR imputation combined with a delta-adjustment where delta is selected in a data-driven way to match the reference-based estimator. For conditional mean imputation, the proposal by @Lu2021 can be implemented by choosing the delta-adjustment as the difference between the conditional mean imputation under the chosen reference-based assumption and MAR on the original dataset. The variance can then be obtained via the jackknife or the bootstrap while keeping the delta-adjustment fixed. The resulting variance estimate is very similar to Rubin's variance. Moreover as shown in @CroEtAl2019, the variance of MAR-imputation combined with a delta-adjustment achieves even better information-anchoring properties than Rubin's variance for reference-based imputation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies I'm not sure I understand this. Based on this description my understanding is that your per patient estimates are the MAR estimate + (Reference estimate - MAR estimate).

Doesn't this just give you the Reference estimates as your imputed values? So why would the variances be different (apologies I appreciate this is likely a stupid question)

As described in section 3.10.2 of the statistical specifications of the package (`vignette(topic = "stat_specs", package = "rbmi")`), two different types of variance estimators have been proposed for reference-based imputation methods in the statistical literature (@Bartlett2021). The first is the frequentist variance which describes the actual repeated sampling variability of the estimator and results in inference which is correct in the frequentist sense, i.e. hypothesis tests have accurate type I error control and confidence intervals have correct coverage probabilities under repeated sampling if the reference-based assumption is correctly specified (@Bartlett2021, @Wolbers2021). Reference-based missing data assumption are strong and borrow information from the control arm for imputation in the active arm. As a consequence, the size of frequentist standard errors for treatment effects may decrease with increasing amounts of missing data. The second is the so-called "information-anchored" variance which was originally proposed in the context of sensitivity analyses (@CroEtAl2019). This variance estimator is based on disentangling point estimation and variance estimation altogether. The resulting information-anchored variance is typically very similar to the variance under missing-at-random (MAR) imputation and increases with increasing amounts of missing data at approximately the same rate as MAR imputation. However, the information-anchored variance does not reflect the actual variability of the reference-based estimator and the resulting frequentist inference is highly conservative resulting in a substantial power loss.

Reference-based conditional mean imputation combined with a resampling method such as the jackknife or the bootstrap was first introduced in @Wolbers2021. This approach naturally targets the frequentist variance. The information-anchored variance is typically estimated using Rubin's rules for Bayesian multiple imputation which are not applicable within the conditional mean imputation framework. However, an alternative information-anchored variance proposed by @Lu2021 can easily be obtained as we show below. The basic idea of @Lu2021 is to obtain the information-anchored variance via a MAR imputation combined with a delta-adjustment where delta is selected in a data-driven way to match the reference-based estimator. For conditional mean imputation, the proposal by @Lu2021 can be implemented by choosing the delta-adjustment as the difference between the conditional mean imputation under the chosen reference-based assumption and MAR on the original dataset. The variance can then be obtained via the jackknife or the bootstrap while keeping the delta-adjustment fixed. The resulting variance estimate is very similar to Rubin's variance. Moreover as shown in @CroEtAl2019, the variance of MAR-imputation combined with a delta-adjustment achieves even better information-anchoring properties than Rubin's variance for reference-based imputation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Moreover, as shown in @CroEtAl2019, the variance of MAR-imputation combined with a delta-adjustment achieves even better information-anchoring properties than Rubin's variance for reference-based imputation.

wolbersm and others added 5 commits January 19, 2024 18:17
@nociale
Copy link
Collaborator

nociale commented Jan 24, 2024

@gowerc I have tried to re-write the function using only base R (i.e. avoid dplyr functions) and I have updated the vignette title. I have included most of your comments but there are two remaining suggestions from you that I cannot commit because they are "outdated" (however, I agree with them).

Could you please have a final review and make the final changes? Thank you!

@gowerc gowerc merged commit b247521 into main Jan 25, 2024
4 checks passed
@gowerc gowerc deleted the UpdateStatsVignette branch September 24, 2024 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants