Clinical NLP

CLIMB: A Benchmark of Clinical Bias in Large Language Models

We introduce a pioneering comprehensive benchmark to evaluate both intrinsic (within LLMs) and extrinsic (on downstream tasks) bias in LLMs for clinical decision tasks. Our experiments across popular and medically adapted LLMs, particularly from the Mistral and LLaMA families, unveil prevalent behaviors with both intrinsic and extrinsic bias. This work underscores the critical need to mitigate clinical bias and sets a new standard for future evaluations of LLMs' clinical bias.

Jul 7, 2024

CliBench: Multifaceted Evaluation of Large Language Models in Clinical Decisions on Diagnoses, Procedures, Lab Tests Orders and Prescriptions

We introduce CliBench, a novel benchmark offering a comprehensive and realistic assessment of LLMs' capabilities in clinical diagnosis. This benchmark not only covers diagnosis from a diverse range of medical cases across various specialties but also incorporates tasks of clinical significance: treatment procedure identification, lab test ordering and medication prescriptions.

Jun 14, 2024

Memorize and Rank: Elevating Large Language Models for Clinical Diagnosis Prediction

We introduce MERA, a clinical diagnosis prediction model that bridges pertaining natural language knowledge with medical practice. We apply hierarchical contrastive learning on a disease candidate ranking list to alleviate the large decision space issue. With concept memorization through fine-tuning, we bridge the natural language clinical knowledge with medical codes.

Jun 13, 2024

A Systematic Evaluation of Decoding-Free Generative Candidate Selection Methods

Existing works have been using decoding-free candidate selection methods to obtain candidate probability from initial output logits over vocabulary. Though these estimation methods are widely used, they are not systematically evaluated, especially on end tasks. We introduce an evaluation of a comprehensive collection of decoding-free candidate selection approaches.

Jun 13, 2024

DICE: Data-Efficient Clinical Event Extraction with Generative Models

We introduce DICE, a robust and data-efficient generative model for clinical event extraction, which specializes in clinical mention identification, and MACCROBAT-EE, the first clinical event extraction dataset with event argument annotation.

May 2, 2023