Medical Datasets
Longitudinal Brain Imaging Set
Longitudinal Brain Imaging Set comprises deidentified brain data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Pediatric Brain Imaging Set 10
Pediatric Brain Imaging Set 10 comprises deidentified brain data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Adult Orthopedic Signal Database
Adult Orthopedic Signal Database comprises deidentified orthopedic data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Benchmark Cardiac Signal Database
Benchmark Cardiac Signal Database comprises deidentified cardiac data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
National Brain Collection
National Brain Collection comprises deidentified brain data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Open Diabetes Signal Database
Open Diabetes Signal Database comprises deidentified diabetes data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Longitudinal Lung Imaging Set
Longitudinal Lung Imaging Set comprises deidentified lung data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Open Brain Dataset
Open Brain Dataset comprises deidentified brain data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Multi-Center COVID-19 Consortium
Multi-Center COVID-19 Consortium comprises deidentified covid-19 data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Annotated Trauma Archive
Annotated Trauma Archive comprises deidentified trauma data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
National Orthopedic Registry
National Orthopedic Registry comprises deidentified orthopedic data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Imaging COVID-19 Consortium
Imaging COVID-19 Consortium comprises deidentified covid-19 data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Benchmark Dermatology Archive
Benchmark Dermatology Archive comprises deidentified dermatology data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Multi-Modal Pathology Archive
Multi-Modal Pathology Archive comprises deidentified pathology data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Adult Brain Consortium
Adult Brain Consortium comprises deidentified brain data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Imaging Stroke Repository
Imaging Stroke Repository comprises deidentified stroke data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Wearable Sepsis Imaging Set
Wearable Sepsis Imaging Set comprises deidentified sepsis data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Adult Retina Signal Database
Adult Retina Signal Database comprises deidentified retina data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Open Cardiac Consortium
Open Cardiac Consortium comprises deidentified cardiac data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Pediatric Orthopedic Imaging Set
Pediatric Orthopedic Imaging Set comprises deidentified orthopedic data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Telemonitoring Pathology Archive
Telemonitoring Pathology Archive comprises deidentified pathology data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Telemonitoring Dermatology Collection
Telemonitoring Dermatology Collection comprises deidentified dermatology data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Wearable Retina Archive
Wearable Retina Archive comprises deidentified retina data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Adult Cancer Imaging Set
Adult Cancer Imaging Set comprises deidentified cancer data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Benchmark Dermatology Imaging Set
Benchmark Dermatology Imaging Set comprises deidentified dermatology data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
National Sepsis Signal Database
National Sepsis Signal Database comprises deidentified sepsis data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Longitudinal Retina Imaging Set
Longitudinal Retina Imaging Set comprises deidentified retina data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Pediatric Sepsis Imaging Set
Pediatric Sepsis Imaging Set comprises deidentified sepsis data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Benchmark Retina Imaging Set
Benchmark Retina Imaging Set comprises deidentified retina data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Pediatric Retina Repository
Pediatric Retina Repository comprises deidentified retina data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Longitudinal Cancer Registry
Longitudinal Cancer Registry comprises deidentified cancer data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Benchmark Orthopedic Dataset
Benchmark Orthopedic Dataset comprises deidentified orthopedic data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Adult Cancer Dataset
Adult Cancer Dataset comprises deidentified cancer data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Adult Genomic Dataset
Adult Genomic Dataset comprises deidentified genomic data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Adult Diabetes Signal Database
Adult Diabetes Signal Database comprises deidentified diabetes data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Open Diabetes Consortium
Open Diabetes Consortium comprises deidentified diabetes data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
Multi-Center Orthopedic Archive
Multi-Center Orthopedic Archive comprises deidentified orthopedic data for research purposes; includes images/signals/clinical annotations collected across multiple centers.
PhysioNet MIMIC-Note (Clinical NLP)
Deidentified clinical notes for NLP research (subset of MIMIC with notes).
The Cancer Imaging Archive (TCIA)
Open archive of cancer imaging datasets with annotations.
Human Protein Atlas (HPA)
Molecular data on protein expression across tissues and cells.
GTEx (Genotype-Tissue Expression)
Tissue-specific gene expression across many donors.
ArrayExpress
Functional genomics data repository from EBI.
GEO Series (Gene Expression Omnibus)
Repository of gene expression and high-throughput functional genomic data.
WHO Global Health Observatory (GHO) datasets
WHO's data portal with thousands of health indicators.
BIMCV COVID-19 Dataset
Annotated chest X-rays and CTs from COVID-19 patients.
COVID-19 Radiography Database
Aggregated CXR images of COVID-19, pneumonia, and normal cases.
PhysioNet Challenge Datasets
Collection of challenge datasets (ECG, ICU, etc.) from PhysioNet.
Diabetic Retinopathy Detection (Kaggle)
Fundus images labeled by DR severity.
ISIC Skin Cancer Archive
Large dermoscopic image repository for skin lesion analysis.
HAM10000 (Skin lesion)
Dermatoscopic images for skin lesion classification.
All of Us Research Program
Large US cohort with health records, genomics, and survey data (access controlled).
ClinicalTrials.gov Public Data
Global registry of clinical trials and study metadata.
ChEMBL
Bioactivity database of drug-like molecules.
ClinVar
Public archive of relationships between genetic variants and phenotypes.
gnomAD
Aggregated exome and genome sequencing data for allele frequencies.
RSNA Pneumonia Detection Challenge (Kaggle)
Labeled CXR dataset with bounding box annotations for pneumonia.
CheXpert Chest Radiograph Dataset
Large labeled chest radiograph dataset for detection of pathologies.
NIH Chest X-ray Dataset (ChestX-ray14)
Large chest X-ray dataset with multiple thoracic disease labels.
ADNI (Alzheimer's Disease Neuroimaging Initiative)
Longitudinal imaging and biomarker data for Alzheimer's disease research.
eICU Collaborative Research Database
Multi-center critical care database from over 200 hospitals with high-resolution vital signs and treatments.
MIMIC-IV Clinical Database
Large ICU dataset with deidentified health records including vitals, labs, medications, and clinical notes.
Bone Fracture Detection X-Ray Dataset
A curated dataset of X-ray images for bone fracture detection. Includes annotated fractures across multiple bone types including wrist, ankle, and elbow.
Knee Osteoarthritis Dataset from OAI
The Osteoarthritis Initiative (OAI) is a multi-center, longitudinal, prospective observational study of knee osteoarthritis. Includes MRI images, X-rays, and clinical data.
Blood Cell Classification Dataset
This dataset contains 12,500 augmented images of blood cells (JPEG) with accompanying cell type labels (CSV). There are approximately 3,000 images for each of 4 different cell types grouped into 4 different folders.
Pathology Image Database
A collection of histopathology images from various tissues and diseases. Includes annotations from pathologists and machine learning labels.
ClinVar Database
ClinVar is a public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. ClinVar facilitates access to and communication about the relationships asserted between human variation and observed health status.
1000 Genomes Project
The 1000 Genomes Project produced a catalogue of common human genetic variation, using openly consented samples from people who declared themselves to be healthy.
Side Effect Resource (SIDER)
SIDER contains information on marketed medicines and their recorded adverse drug reactions. The information is extracted from public documents and package inserts.
ChEMBL Database
ChEMBL is a manually curated database of bioactive molecules with drug-like properties. It brings together chemical, bioactivity and genomic data to aid translation of genomic information into effective new drugs.
DrugBank Database
DrugBank is a comprehensive, freely accessible, online database containing information on drugs and drug targets. It combines detailed drug data with comprehensive drug target information.
Medical Cost Personal Dataset
This dataset contains demographics, health indicators, and medical costs of individuals. Useful for understanding factors affecting healthcare costs.
Clinical Trials Data from ClinicalTrials.gov
A comprehensive database of privately and publicly funded clinical studies conducted around the world. Data includes study protocols, results, and locations.
Open-i: Open Access Biomedical Image Search Engine
Open-i is an open-access biomedical image search engine that allows users to search for biomedical literature figures from the PubMed Central repository.
OCT Images for Retinal Diseases
Optical Coherence Tomography (OCT) images of the retina for detection of choroidal neovascularization (CNV), diabetic macular edema (DME), and drusen.
Diabetic Retinopathy Detection Dataset
This dataset contains images of retinas taken using fundoscopy. The images are categorized into 5 levels based on the severity of diabetic retinopathy (DR).
Skin Cancer Detection Dataset
This dataset contains images of benign and malignant skin lesions. The dataset consists of 1800 images of common pigmented skin lesions across 7 lesion types.
Breast Cancer Wisconsin Diagnostic Dataset
Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image.
The Cancer Genome Atlas (TCGA)
TCGA is a landmark cancer genomics program that molecularly characterized over 20,000 primary cancer and matched normal samples spanning 33 cancer types.
Epileptic Seizure Recognition Dataset
This dataset is a pre-processed and re-structured/reshaped version of a very commonly used dataset featuring epileptic seizure recognition. The dataset consists of 5 different folders, each with 100 files.
OpenNeuro: Brain Imaging Data Repository
OpenNeuro is a free and open platform for sharing neuroimaging data from human brain imaging research studies. It provides access to publicly available brain imaging datasets.
Alzheimer's Disease Neuroimaging Initiative (ADNI)
ADNI is a longitudinal study designed to develop clinical, imaging, genetic, and biochemical biomarkers for the early detection and tracking of Alzheimer's disease.
ECG Arrhythmia Classification Dataset
The MIT-BIH Arrhythmia Database contains 48 half-hour excerpts of two-channel ambulatory ECG recordings, obtained from 47 subjects studied by the BIH Arrhythmia Laboratory.
Chest X-Ray Images for Pneumonia Detection
This dataset contains 5,863 X-Ray images (JPEG) in 2 folders (train/test). There are 2 classes: Pneumonia and Normal. Chest X-ray images (anterior-posterior) were selected from retrospective cohorts of pediatric patients.
MIMIC-III Clinical Database
MIMIC-III is a large, freely-available database comprising deidentified health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012.