For every patient
- who has had a visit in the past x years (using x=2 for now and defining been seen as having had a clinical visit)
- has not been diagnosed with CKD yet (using diagnosis for now but extend to medications and abnormal egfrs)
- and has not had an eGFR in the past y months
Predict the top k individuals (based on intervention capacity) who are risk of having an abnormal eGFR in the next z months
- Predict risk of CKD stage 3 or above in the next 12 months (we can vary this)
- Baselines:
- current practice
- clinical guidelines
- CDC adopted screening tool
- Metric: Precision (PPV) at top k (:warning: need to determine k based on capacity)
- Fairness metric: TPR disparity by Race, Gender, SES, access, etc.
- Define Cohort based on formulation
- Define Outcome/Label based on formulation (will get diagnosed with X in the next z months)
- Define Training and Validation sets over time
- Define and generate predictors
- Train Models on each training set and score all patients in the corresponding validation set
- Evaluate all models for each validation time according to metric (PPV at top k)
- Select "Best" model based on results over time
- Explore the model to understand who it flags, how they compare to the cohort, important predictors
- Check and/or correct for bias issues
We are using Triage to build and select models. Some background and tutorials on Triage:
- Tutorial on Google Colab - Are you completely new to Triage? Run through a quick tutorial hosted on google colab (no setup necessary) to see what triage can do!
- Dirty Duck Tutorial - Want a more in-depth walk through of triage's functionality and concepts? Go through the dirty duck tutorial here with sample data
- QuickStart Guide - Try Triage out with your own project and data
- Suggested workflow
- Understanding the configuration file
Assuming Triage is installed and the data is in a postgres database. To run,
- activate virtual environment source env/bin/activate
- python run.py -c configfilename
Choices to Make
- replace flag (set to false until we want to nuke everything)
- save predictions (don't for the beginning)
- number of processors to use
- cohort:All
- cohort: all patients who've had a visit in the past 2 years and do not have CKD yet
- label: will get diagnosed with CKD in the next 12 months
- config file, notebook with model selection
- cohort: all patients who've who've had a visit in the past 2 years, do not have CKD yet, and no previous abnormal eGFRs
- label: will get diagnosed with CKD in the next 12 months
- config file, notebook with model selection