The materials in this section of the repository were used in the study:
H. Lamba, R. Ghani*, and K.T. Rodolfa*. An Empirical Comparison of Bias Reduction Methods on Real-World Problems in High-Stakes Policy Settings. SIGKDD Explorations 23, 69-85. Jun. 2021. Available in arXiv: 2105.06442
sampling_methods: Contains code for sampling the dataset in different configurations.
python test_sampling.py test_sampling_config.yaml
triage: Contains all the code to run triage which takes as an input triage config file.
- Run
triage
on the dataset with the original configuration. - Run Sampling to generate sampled matrices.
- For each sampling config, run
triage
withreplace=False
; changemodel_comment
and set project_dir as the directory of the sampled matrices.
- In the original
triage
config file, remove the features that contribute to demographics information. - Run
triage
with a different project dir andmodel_comment
.
- Clone
shaycrk/fair-classification
at thepython3
branch into this folder (link to repo) -- this is a modification of Zafar'sfair-classification
repository to work with python 3. Note the dependency onshaycrk/dccp
(which should be handled by installing from therequirements.txt
from the cloned repo) - See
zafar_methods
folder
- Run
triage
to generate a grid of models - Follow the analysis in model_selection/fairness_model_selection.ipynb to account for fairness in the model selection process (either by setting a maximum allowable disparity or a maximum allowable loss in accuracy)
- Create two configs from the original config. Modify the
cohort
information such that each config runs only on subset of the entities belonging to only one demographic group. - Run
triage
Note: some very preliminary code exploring an "ablation" study separating the effects of the decoupling and recall-equalizing score thresholds can be found in the composite_ablation/ directory (this naively assumes the scores from the different models are comparable, which is generally not a reasonable assumption, but may be a common one in applying decoupling approaches).
For any of the above methods, you can run RecallAdjuster
to balance equity while minimizing the compromise in precision (See each project's folder in RecallAdjuster
for details)