A Comprehensive Survey of Electronic Health Record Modeling: From Deep Learning Approaches to Large Language Models

Builder & Current Maintainer: Weijieying Ren, YuQing Huang, Jingxi Zhu, Zehao Liu,Tianxiang Zhao and Prof. Vasant Honavar.

Paper List

We have summarized the main branches of works for EHRs modeling, including its downstream tasks and applications. For more details, please refer to our recent survey (paper).

Branch 8: Clinical Applications

8.1 Clinical Understanding

8.1.1 Clinical Notes Summarization

8.1.2 Concept Extraction

8.1.3 Image-Text Retrieval

8.1.4 Question Answering

8.1.6 Clinical Education / Knowledge Sharing

8.1.6 General Clinical AI Capabilities

8.2 Clinical Reasoning and Decision Support

8.2.1 Diagnosis Support

8.2.1.1 Diagnosis Prediction

8.2.1.1.1 Early Detection
8.2.1.1.2 Differential Diagnosis

8.2.2 Prognostic Forecasting

8.2.2.1 Risk Prediction

8.2.2.2 Prognosis Estimation

8.2.2.3 Readmission Prediction

8.2.2.4 Mortality Prediction

8.2.3 Treatment Modeling

8.2.3.1 Treatment Effect Estimation

8.2.3.2 Treatment Recommendation

8.2.3.3 Counterfactual Simulation

8.2.4 Patient Stratification and Cohort Discovery

8.2.4.1 Subtyping / Endotyping

8.2.4.2 Personalized Cohort Selection

8.3 Clinical Operations Support

8.3.1 Workflow Optimization

8.3.1.1 Triage and Prioritization

8.3.1.2 Clinical Note Generation

8.3.2.1 Radiology Report Generation
  • Collaboration between clinicians and vision-language models in radiology report generation
    Ryutaro Tanno, David G. T. Barrett, Andrew Sellergren, Sumedh Ghaisas, Sumanth Dathathri, Abigail See, Johannes Welbl, Charles Lau, Tao Tu, Shekoofeh Azizi, Karan Singhal, Mike Schaekermann, Rhys May, Roy Lee, SiWai Man, Sara Mahdavi, Zahra Ahmed, Yossi Matias, Joelle Barral, S. M. Ali Eslami, Danielle Belgrave, Yun Liu, Sreenivasa Raju Kalidindi, Shravya Shetty, Vivek Natarajan, Pushmeet Kohli, Po-Sen Huang, Alan Karthikesalingam & Ira Ktena
    Nature Medicine 2025
  • 8.3.1.3 Care Recommendation

    8.3.2 Patient-Task Matching

    8.3.2.1 Trial Matching

    8.3.3 Clinical Interaction Agents

    8.3.3.1 Assistants for Clinical Decision-Making

    8.3.3.2 Assistants for Administrative Communication

    8.3.3.3 Assistive Devices and Technologies

    8.3.4 Administrative Automation

    8.3.4.1 Clinical Coding Automation

    8.3.4.2 Scheduling and Resource Allocation

    Branch 1: Data-Centric Approaches

    Branch 2: Neural Modeling Strategies

    2.1 Feature-Aware Modules

    2.1.1 Discretization and Binning-Based Methods

    2.1.2 Kernel-Based Methods

    2.2 Model Architecture Design

    2.2.1 Tree-based

    2.2.2 Graph-based

    2.2.3 Rule-based Models

    2.2.4 Additive-model-based

    2.2.5 Hierarchical and Structured Temporal Models

    2.3 Temporal Dependency Modeling

    2.3.1 Irregular/Asynchronous Sampling

    2.3.2 Multi-Timescale Dynamics

    2.3.3 Conditional Clinical Sequences

    2.4 Meta-Architectural Strategies

    2.4.1 Meta-Adaptive Modeling

    2.4.2 Neural Architecture Search (NAS)

    Branch 3: Learning-Focused Approaches

    3.1 Self-Supervised Learning

    3.2 Clustering-Based Methods

    3.3 Latent Representation Learning

    3.4 Causal Representation Learning

    3.5 Continual Learning

    Branch 4: Learning with External Modalities and Knowledge

    4.1 Multi-modal Learning

    4.1.1 Cross-model Alignment

    4.1.1.1 Global Alignment

    4.1.1.2 Fine-grained Alignment

    4.1.1.2.1 Fine-Grained Contextual Understanding.
    4.1.1.2.2 Extending 2D to 3D Imaging.
    4.1.1.2.3 Region-Level Medical LMMs.

    4.1.1.3 Data-Efficient Parallel and Unpaired Alignment.

    4.1.1.3.1 Parallel Data Collection.
    4.1.1.3.2 Learning from unpaired Data.

    4.1.1.3 Data Efficient Parallel Alignment

    4.1.2 Knowledge-Informed Modeling

    4.1.3 Temporal and Asynchronous Integration and Modeling

    4.1.4 Modality-Specific Robustness

    4.2 Multi-source Learning

  • Communication Efficient Federated Generalized Tensor Factorization for Collaborative Health Data Analytics.
    Jing Ma, Qiuchen Zhang, Jian Lou, Li Xiong, Joyce C. Ho
    Proceedings of the Web Conference 2021, pages 171–182.
  • 4.3 Learning with knowledge Graph

    4.4 Learning with External Data Source

    4.4.1 External Knowledge via Prompting and Instruction Tuning

    4.4.2 Internalized Knowledge from LLMs.

    1.4.3 Case-Based Knowledge from Patient Records

    Branch 5: LLM-Based Modeling and Systems

    5.1 Learning with LLMs

    5.1.1 Prompt-Based Methods

    5.1.2 Pretraining Methods

    5.1.3 Fine-Tuning Methods

    5.1.4 Retrieval-Augmented Methods

    5.2 LLM-Driven Medical Agents

    1.2.6 Masking Modeling

    1.3 Reinforcement Learning

    1.4 Temporal Modeling

    1.7 Clinical Agent

    others:Benchmark

    1.3 Learning with External Knowledge

    1.3.1 Learning with Good Model Initialization

    1.3.1 Learning with Knowledge Graph

    1.3.2 Learning with Large Language Models

    1.4 Causal Representation Learning

    Branch 2: Downstream Tasks

    2.1 Generation

    2.1.1 GAN-based Models

    2.1.2 VAE-based Models

    2.1.3 Diffusion-based Models

    2.1.4 Transformer-based

    2.1.5 Large Language Model-based

    2.1.6 Model-agnostic

    2.2 Anomaly Detection

    2.3 Transfer Learning

    2.4 Explanation/Model Assesment

    2.5: Retrieval

    2.5: Efficiency

    Branch 3: Application

    3.1 Clinical Tabular Data

    3.2 Financial Tabular Data

    Existing Surveys

    Tools & Libraries

    Last updated on March 05, 2024. (For problems, contact wjr5337@psu.edu. To add papers, please pull request at our repo)