Skip to content
@Medical-Event-Data-Standard

Medical Event Data Standard

A minimal, ML-oriented data standard for medical record data to improve reproducibility, robustness, and computational performance.

Medical Event Data Standard

This organization contains GitHub Repositories for the Medical Event Data Standard (MEDS), a simple dataset schema for machine learning over electronic health record (EHR) data. Unlike existing tools, pipelines, or common data models, MEDS is a minimal standard designed for maximum interoperability across datasets, existing tools, and model architectures. By providing a simple standardization layer between datasets and model-specific code, MEDS can help make machine learning research for EHR data dramatically more reproducible, robust, computationally performant, and collaborative. Alongside this report, we also release several existing integrations with models, datasets, and tools, and will work actively with the community going forward for further adoption and use. See our draft proposal for more details, and please leave comments or questions via GitHub issues to help us improve this effort! Find the Contribution guidelines here.

Software Ecosystem

Project Type Documentation URL Repository URL Paper URL Description
Core MEDS Core GitHub GitHub OpenReview A data standard and community for building and sharing EHR machine learning tools
MEDS-Reader Package Docs GitHub arXiv An optimized Python package for efficient EHR data processing achieving 10-100x improvements in memory, speed, and disk usage
MEDS-Transforms Package GitHub A set of functions and scripts for extraction to and transformation/pre-processing of MEDS-formatted data.
MEDS-Tab Package Docs GitHub A library designed for automated tabularization, data preparation with aggregations and time windowing.
ACES Package Docs GitHub arXiv A package and configuration language for reproducible extraction of task cohorts for machine learning over event-stream datasets
MEDS-Torch Package Docs GitHub Advancing healthcare machine learning through flexible, robust, and scalable sequence modeling tools.
MEDS-Evaluation Package GitHub Evaluation pipeline for MEDS.
MEDS-ETL Package GitHub Efficient ETL that supports OMOP, MIMIC, eICU, PyHealth.
FEMR Package GitHub A Python package for manipulating longitudinal EHR data for machine learning, with a focus on supporting the creation of foundation models and verifying their presumed benefits in healthcare.
MEDS-DEV Benchmark GitHub A benchmark for evaluating the performance of machine learning models on MEDS-formatted data.
MEDS-Inspect Package GitHub A package to interactively inspect your MEDS data.

Pretrained Models

Datasets / Benchmarks

Dataset Stays Version Frequency Origin Originally Published License Repository Link MEDS ETL Full Dataset Name
AUMCdb 23,000 v1.0.2 up to 1 minute Netherlands 2019 Not specified DANS Github Amsterdam University Medical Center Database
eICU 201,000 v2.0 5 minutes USA 2017 PhysioNet PhysioNet Github eICU Collaborative Research Database
HiRID 34,000 v1.1.1 2 / 5 minutes Switzerland 2020 Physionet PhysioNet Github High-Resolution ICU Dataset
INSPIRE 130,000 v1.2 Not specified South Korea 2024 Korea Credentialed Health Data License PhysioNet Github INformative Surgical Patient dataset for Innovative Research Environment
MIMIC-IV 73,000 v3.1
~1 hour USA 2020 PhysioNet PhysioNet Github Medical Information Mart for Intensive Care IV
NWICU 25,000 v0.1.0 Not specified USA 2023 Physionet PhysioNet Github Northwestern ICU Database
SICdb 27,350 v1.0.8 1 minute Austria 2024 PhysioNet PhysioNet Github Salzburg Intensive Care Database

Coming Soon...

Tools that are planned to be compatible with MEDS:

Pinned Loading

  1. meds meds Public

    Schema definitions and Python types for Medical Event Data Standard, a standard for medical event data such as EHR and claims data

    Python 73 7

  2. MEDS-DEV MEDS-DEV Public

    The MEDS Decentralized Extensible Validation (MEDS-DEV) Benchmark: Establishing Reproducibility and Comparability in ML for Health

    Python 29 8

  3. MIMIC_IV_MEDS MIMIC_IV_MEDS Public

    The MIMIC-IV MEDS ETL

    Python 12 6

  4. meds_testing_helpers meds_testing_helpers Public

    Testing, benchmarking, and synthetic data generation helpers for MEDS tools, pipelines, and models.

    Python 2

  5. ETL_MEDS_Template ETL_MEDS_Template Public template

    A template repository for a MEDS-Transforms powered extraction pipeline for a custom dataset.

    Python 6 1

  6. meds_etl meds_etl Public

    A collection of ETLs from common data formats to Medical Event Data Standard

    Python 33 12

Repositories

Showing 10 of 14 repositories

Top languages

Loading…

Most used topics

Loading…