Dr AI? Should artificial intelligence replace qualified doctors after high scores in trials?

Dr AI? Should artificial intelligence replace qualified doctors after high scores in trials?

0

Is a headache a warning sign of stroke? Does a cough require an X-ray? What do abnormal test results really mean?

With just a few taps to describe symptoms and upload medical reports, people can receive a polished, Straight-A student in standardised tests seemingly professional assessment from artificial intelligence (AI) in seconds. More and more people have begun to turn to it for medical advice before seeing a doctor.

But does that mean AI can truly diagnose and treat patients?

A study published in early April by researchers at Germany’s Marburg University and University Hospital Giessen and Marburg found that in a standardised knowledge test on acute kidney injury (AKI), several large language models (LLMs) outperformed the medical professionals who took part in the assessment.

Researchers compared 13 publicly available LLMs with 123 volunteer participants at the 131st Annual Congress of the German Society of Internal Medicine, including medical students and physicians in internal medicine. Both groups completed the same AKI knowledge assessment, which consisted of two case vignettes and 15 single-best-answer multiple-choice questions.

The LLMs achieved a mean score of 13.5 out of 15 or 90 per cent, with several models reaching a perfect score, while the human participants averaged 7.3 out of 15, or 48.7 per cent. The models also completed the test far more quickly.

“These findings show that LLMs can provide factual medical knowledge very quickly. That creates opportunities for everyday clinical practice,” said Philipp Russ, the study’s corresponding author.

Weak link in clinical reasoning

High scores on standardised tests, however, do not necessarily mean AI has the judgment required for real-world clinical care. A study published in JAMA Network Open on April 13 found that LLMs still fall short in clinical reasoning, especially in the early stages of a case, when limited information often prevents them from generating an appropriate differential diagnosis.

To better reflect how diagnosis unfolds in practice, the researchers at Mass General Brigham and other institutions evaluated 21 frontier LLMs using 29 standardised clinical vignettes. The models were fed information step by step, beginning with basic details such as a patient’s age, gender and symptoms, and followed by physical examination findings and laboratory results. Their performance at each stage was assessed by medical student evaluators.

The result showed that all the models failed to produce an appropriate differential diagnosis more than 80 per cent of the time. That means they often could not reliably determine the most likely cause, rule out serious disease or offer sound guidance on what should be investigated next.

“Differential diagnoses are central to clinical reasoning and underlie the ‘art of medicine’ that AI cannot currently replicate,” said corresponding author Marc Succi, adding that the promise of AI in clinical medicine continues to lie in its potential to augment, not replace, physician reasoning.

Doctor-led collaboration

If AI is not ready to practise medicine on its own, what role should it play in healthcare? Jens Kleesiek, director of the Institute for Artificial Intelligence in Medicine at Essen University Hospital and the University of Duisburg-Essen said that thanks to AI, the collaboration between doctors and computers is constantly improving.

“We are at a point where digital systems no longer just provide support, but actively intervene in processes. For example, by taking over documentation or coordinating procedures,” Kleesiek said at the opening of the 2026 Annual Congress of the German Society of Internal Medicine on April 18. “This will fundamentally change medical care.”

Even so, the doctor’s primary responsibility remains unchanged. Kleesiek emphasised that the human factor is still crucial and that AI must be deployed under the guidance of physicians with the expertise to understand the technology and use it properly. Marc Succi made a similar point, saying that “LLMs in healthcare continue to require a ‘human in the loop’ and very close oversight.”

As AI is pushed further into clinical practice, the risks that come with it also require close attention. Fares Alahdab, an associate professor at the University of Missouri School of Medicine, warned that experienced clinicians are often better able to spot flawed AI-generated suggestions, while medical students may lack the judgment needed to detect subtle but potentially dangerous errors.

“A more insidious risk is the outsourcing of reasoning, a process that tends to occur gradually and almost imperceptibly,” he said, adding that AI models produce fluent, polished responses that can lead users to abandon independent information-seeking, critical appraisal and knowledge synthesis. Over time, this may erode skills that should be continuously reinforced.

  • A Tell Media / Xinhua report
About author

Your email address will not be published. Required fields are marked *