Skip to content

Latest commit

 

History

History
executable file
·
103 lines (79 loc) · 4.09 KB

README.md

File metadata and controls

executable file
·
103 lines (79 loc) · 4.09 KB

How to select predictive model for causal inference

Overview

This package contains simulations for causal inference, estimators for ATE and CATE as well as code for experiments described in the paper : How to select predictive models for causal inference ?

Package Features

The package code is contained in: caussim

  • estimation contains CATE and ATE estimators usable with any scikit-learn compatible base estimators and meta-learners such as TLearner, SLearner or RLearner.

  • simulations simulations with basis expansion (available Nystroem, Splines)

  • experiences used to run extensive evaluation of causal metrics on ACIC 2016 and handcrafted simulations.

  • reports contains the scripts used to derive figures and tables presented in the paper. The main results are obtained by launching the

  • utils.py plot utils

  • pdistances naive implementation of MMD, Total Varation and Jensen Shannon Divergences used to measure population overlap

  • demos contains notebooks used to create toy example and risks maps for the 2D simulations.

  • data contains utils to load semi-simulated datasets (ACIC 2016, ACIC 2018, TWINS). A dedicated README is available in the root data folder.

Experiences

Experiences outputs are mainly csvs (one for each sampled dataset). To launch an experience, run python scripts/experiences/<experience.py> and it should output the csv in a dedidacted folder in the corresponding subfolder data/experiences/<dataset>/<experience_name>.

🔎 Replicate the main experience of the paper (section 5.), launch the script scripts/experiences/causal_scores_evaluation.py. Make sure that the configurations for the datasets at the beginning of the file is :

from caussim.experiences.base_config import DATASET_GRID_FULL_EXPES
DATASET_GRID = DATASET_GRID_FULL_EXPES

📢 Note that the results of the section 5 are already provided in the zenodo link experiences.zip.

Reports

Reports outputs are mainly figures for the papers. To obtain the results, run pytest scripts/reports/<report.py> and it should output the figures in one or several corresponding folders in figures/.

The main report type is a pytest function contained in the reports/causal_scores_evaluation.py script. For each macro-dataset, it plot the results of running a given set of candidate estimators with a fixed nuisance estimator on several generation process of the macro-dataset (often hundreds of sampled datasets).

🔎 Replicate the main figure of the paper (Figure 3.), launch the script scripts/reports/_1_r_risk_domination.py. It should take some time because of the high number of simulations results. Make sure that the appropriate experiences results exists. The one used in the paper are provided in experiences.zip.

pytest scripts/reports/causal_scores_evaluation.py

Installation

  • We recommend the use of poetry and python>=3.9 to manage dependencies.

You can install caussim via poetry:

poetry install

or

pip. In this case you also need to install the dependies listed in the pyproject.toml:

pip install caussim

Dependencies:

python = ">=3.9, <3.11"
python-dotenv = "^0.15.0"
click = "^8.0.1"
yapf = "^0.31.0"
matplotlib = "^3.4.2" numpy = "^1.20.3"
seaborn = "^0.11.1"
jupytext = "^1.11.5"
rope = "^0.19.0"
scikit-learn = "^1.0"
jedi = "^0.18.0"
tqdm = "^4.62.3"
tabulate = "^0.8.9"
statsmodels = "^0.13.1"
pyarrow = "^6.0.1"
submitit = "^1.4.1"
rpy2 = "^3.4.5"
moepy = "^1.1.4"