AI fails at primary patient diagnosis more than 80% of the time, study finds

Euronews
Apr 14, 2026

AI fails at primary patient diagnosis more than 80% of the time, study finds


AI language models fail to produce an appropriate early diagnosis more than 80% of the time, suggesting they are not yet safe for unsupervised clinical use, according to a new study.

Generative artificial intelligence (AI) still lacks the reasoning processes needed for safe clinical use, a new study has found.

 

AI chatbots have improved their diagnostic accuracy when presented with comprehensive clinical information, but still failed to produce an appropriate differential diagnosis more than 80% of the time, according to researchers at Mass General Brigham, a Boston-based non-profit hospital and research network and one of the largest health systems in the United States.

The results of the study, published in the open-access JAMA Network Open medical journal, found that large language models’ (LLMs) fall short of the reasoning required for clinical use.

“Despite continued improvements, off-the-shelf large language models are not ready for unsupervised clinical-grade deployment,” said Marc Succi, co-author of the study.

He added that AI cannot yet replicate differential diagnosis, which is central to clinical reasoning, and which he considers the “art of medicine”.

Differential diagnosis is the first step for healthcare professionals to identify a condition, separating it from others with similar symptoms.