Code for analysis/simulations related to missing eligibility data in electronic health record based observational studies using target trial emulation. For more information, see our manuscript:
Benz, L., Mukherjee, R., Wang, R., Arterburn, D., Fischer, H., Lee, C., Shortreed, S., and Hauense, S. "Adjusting for Selection Bias Due to Missing Eligibility Criteria in Emulated Target Trials." Under Review , 2024. (Pre-Print)
- build_complete_case_T2DM_population.R: Build dataset which will define the notion of a complete case population for simulations.
- clean_weights.R: Cleans weights for all measures.
- microvascular_dataset.R: Creates dataset for replicating (O'Brien, 2018)
- microvascular_dataset_tte.R: Creates datasets necessary for replicating (O'Brien, 2018) in TTE format
- save_parquet.R: Script to save out some of the larger .sas7bdat files in .parquet format for fast reading
- helpers.R: Useful helper functions
- img_to_pdf.R: Script to combine all .png image files into 1 PDF
- fit_models.R: Fits and saves out model objects
- f31_figures.R: Figures/table code
- helpers.R: Helper functions for simulations
Simulations based on missing data in the eligibility criteria
- weight_trajectories_function.R: Functions to generate simulated weight trajectory functions
- eda_plot_trajectories.R: Function to make some plots to explore the weight trajectories
- generate_data.R: Functions to generate simulated data
- specify_inputs.R: Where inputs for simulations are specified
- run_simulation.R: Wrapper to run the simulations
- compute_truth.R: Compute true ATE
- fit_analysis_models.R: Fit analysis model to estimate the ATE
- analysis.R: Analyze results
- explore_sim_params.R: Scipt to better understand underlying data generation process for selction simulation settings
- scratch.R: Scratch pad of snippets useful for better understand underlying data generation process for selction simulation settings
- boostrap_variance_sim.R: Simulation for testing the validity of the boostrap for variance.
- save_boostrap_simulated_data.R: Save out data files for bootstrap variance simulation
- analyze_boostrap_variance.R: Analyze results of boostrap validity
- microvascular_survival.R: Analyze survival for data application mirroring (O'Brien 2018)
- microvascular_tte_pp.R: Target Trial Emulation for data application mirroring (O'Brien 2018) (PP)
- microvascular_tte_itt.R: Target Trial Emulation for data application mirroring (O'Brien 2018) (ITT)
- boostrap_microvascular_iit.R: Show boostrapped variance over full grid for 100 replicates of ITT
- plot_HR.R: Function to plot discrete hazard ratios.
- plot_missing_data.R: Function to summarize missing data as a function of lookbacks.
- bootstrap_ITT: Final boostrap (1000 replicates) for ITT based on lookbacks of 3 (BMI) and 12 (A1c)
- bootstrap_PP: Final boostrap (1000 replicates) for PPe based on lookbacks of 3 (BMI) and 12 (A1c)
Due to privacy concerns, the raw/derived datasets can not be stored locally or in GitHub.
- simulations/: Folder of simulation inputs and outputs
- microvascular_results/: Results for effect of bariatric surgery on microvascular outcomes from sensitivity analysis.
Figures saved out from various analyses
.sh files for transferring files to and from the cluster.
.sh files for batch jobs on the cluster
- missing_data_sims.sh: Job file for missing data simulations
- microvascular_pp.sh: Run analysis for target trial emulation for microvascular outcomes (PP)
- microvascular_itt.sh: Run analysis for target trial emulation for microvascular outcomes (ITT)
- boot_microvasular_itt.sh: Compute boostrapped variance over full grid for 100 replicates of ITT
- boot_final_itt.sh: Compute boostrapped variance for final choice of lookbacks for 1000 replicates of ITT
- boot_final_pp.sh: Compute boostrapped variance for final choice of lookbacks for 1000 replicates of PP
- data_prep_bootsrap_variance_sims.sh: Data prep for bootstrap variance simulations
- bootstrap_variance_sim.sh: Bootstrap variance job file
- boot_variance_all.sh: Wrapper for all boot variance jobs