This organization contains GitHub Repositories for the Medical Event Data Standard (MEDS), a simple dataset schema for machine learning over electronic health record (EHR) data. Unlike existing tools, pipelines, or common data models, MEDS is a minimal standard designed for maximum interoperability across datasets, existing tools, and model architectures. By providing a simple standardization layer between datasets and model-specific code, MEDS can help make machine learning research for EHR data dramatically more reproducible, robust, computationally performant, and collaborative. Alongside this report, we also release several existing integrations with models, datasets, and tools, and will work actively with the community going forward for further adoption and use. See our draft proposal for more details, and please leave comments or questions via github issues to help us improve this effort!
Project | Type | Documentation URL | Repository URL | Paper URL | Description |
---|---|---|---|---|---|
Core MEDS | Core | GitHub | GitHub | OpenReview | A data standard and community for building and sharing EHR machine learning tools |
MEDS-Reader | Package | Docs | GitHub | arXiv | An optimized Python package for efficient EHR data processing achieving 10-100x improvements in memory, speed, and disk usage |
MEDS-Transforms | Package | GitHub | A set of functions and scripts for extraction to and transformation/pre-processing of MEDS-formatted data. | ||
MEDS-Tab | Package | Docs | GitHub | A library designed for automated tabularization, data preparation with aggregations and time windowing. | |
ACES | Package | Docs | GitHub | arXiv | A package and configuration language for reproducible extraction of task cohorts for machine learning over event-stream datasets |
MEDS-Torch | Package | Docs | GitHub | Advancing healthcare machine learning through flexible, robust, and scalable sequence modeling tools. | |
MEDS-Evaluation | Package | GitHub | Evaluation pipeline for MEDS. | ||
MEDS-ETL | Package | GitHub | Efficient ETL that supports OMOP, MIMIC, eICU, PyHealth. | ||
FEMR | Package | GitHub | A Python package for manipulating longitudinal EHR data for machine learning, with a focus on supporting the creation of foundation models and verifying their presumed benefits in healthcare. | ||
MEDS-DEV | Benchmark | GitHub | A benchmark for evaluating the performance of machine learning models on MEDS-formatted data. |
- CLMBR-T-base: https://huggingface.co/StanfordShahLab/clmbr-t-base
- EHRSHOT: https://ehrshot.stanford.edu
Tools that are planned to be compatible with MEDS: