Putting AI to the Clinical Test

A study published in Nature Medicine has subjected OpenAI's ChatGPT to a structured evaluation of its ability to make medical triage recommendations — the critical first step in emergency care where patients are sorted by the urgency of their condition. The research represents one of the most methodologically rigorous assessments to date of whether large language models can perform reliably in clinical settings where errors can have life-or-death consequences.

Triage is a particularly challenging test for AI systems because it requires integrating multiple streams of information — reported symptoms, patient history, vital signs, and contextual cues — to make rapid judgments about how urgently a patient needs care. Getting it wrong in either direction carries serious risks: under-triaging a critical patient can lead to delayed treatment and preventable death, while over-triaging a stable patient wastes scarce emergency resources.

Study Design and Methodology

The researchers designed a structured test using standardized clinical vignettes — detailed written descriptions of patient presentations that are commonly used in medical education and board examinations. Each vignette included information about the patient's presenting complaint, relevant medical history, vital signs, and physical examination findings.

ChatGPT was asked to assign each case to one of five standard triage categories, ranging from immediate life-threatening emergencies requiring instant intervention to non-urgent conditions that could safely wait for routine care. The AI's recommendations were then compared against consensus triage assignments made by experienced emergency medicine physicians.

The study controlled for several variables that have complicated previous evaluations of AI medical performance. Prompt engineering was standardized to eliminate variation in how questions were posed to the model. Multiple runs were conducted to assess consistency, and the researchers analyzed not just the accuracy of the final triage assignment but also the reasoning provided by the model.