Radiologists & AI often fail to detect manipulated X-ray images

Is a new era of medical disinformation looming? According to a study, AI-generated X-ray images are barely recognizable by experts and AI systems.

CT scan of the brain (symbolic image)

(Image: Tomatheart / Shutterstock.com)

Apr 16, 2026 at 11:19 am CEST

5 min. read

By

Dr. Fabio Dennstädt

For the first time, AI models like ChatGPT make it possible for laypeople to create anatomically plausible, AI-generated X-ray images solely through simple text commands. While this could be useful in medical training for simulating rare diseases, researchers warn of enormous risks of misuse, such as insurance fraud, legal disputes, or the targeted manipulation of research data.

Scientists at Mount Sinai Hospital in New York have investigated how well 17 experienced radiologists from six countries and various current AI models are at recognizing “deepfakes” of X-ray images. The results reveal a worrying problem.

Read also

Numerous AI-generated tissue section images side by side (28 pieces)

AI-generated tissue images: Study shows risk of deception

Experimental setup with tests for humans and machines

For their investigation, the researchers used two datasets. The first consisted of 154 X-ray images covering various body regions such as the chest, spine, arms, and legs. However, half of the images were not real X-rays but AI images generated by GPT-4o. The second dataset contained specific chest X-rays from a specialized AI model for generating medical images.

Videos by heise

The study proceeded in three phases:

Blind Phase: The radiologists were asked to assess the technical quality and make diagnoses. They were not informed that AI images were included.
Identification Phase: After the doctors were informed about the deepfakes, they had to decide which images were real and which were AI-generated.
AI Comparison: Four leading AI models (GPT-4o, GPT-5, Gemini 2.5 Pro, and Llama 4 Maverick) were also tested to see if they could identify which images were real and which were AI-generated.

Difficulties in Recognizing AI-Generated X-ray Images

The accuracy in recognizing AI-generated X-ray images was surprisingly low and did not depend on the medical professionals' experience.

In the blind phase, only 41 percent of the radiologists (7 out of 17) spontaneously suspected that AI-generated images might be present in the dataset. The remaining experts considered the deepfakes to be authentic clinical cases. However, even in the identification phase (after the radiologists were explicitly asked to look for AI fakes), their average accuracy was only about 75 percent. This means that one in four images was misjudged.

Read also

Photomicrograph of hepatosteatosis showing an accumulation of fat in liver cells, known as fatty liver disease.

AI in medicine: study shows shortcomings of commercial image analysis models

Interestingly, radiologists with up to 40 years of service did not perform significantly better than residents. The ability to recognize deepfakes appears to be a completely new skill that is not acquired through traditional clinical experience.

Leading AI Models Also Fail

AI models themselves also had similar difficulties in distinguishing AI-generated X-ray images from real ones. None of the tested models were able to reliably recognize the synthetic images.

While the OpenAI models achieved an accuracy of about 83 to 85 percent, Google's Gemini 2.5 Pro and Meta's Llama 4 Maverick performed significantly worse, achieving scores between 56 and 59 percent (which is barely better than random guessing). GPT-4o, which was used to create the synthetic images, was also unable to reliably distinguish them from real images.

Indications of AI Generation

Despite the high quality of the deepfakes, according to the study, there are certain characteristics that indicate AI generation. For example, bone structures often appear excessively smooth and lack the fine, irregular textures found in real biological tissue. Another technical indicator is found in how “noisy” the X-ray image is. While the usual image noise in real images is irregular due to the physical properties of radiation, the AI's grain pattern often appears unnaturally uniform across the entire image. Furthermore, AI models sometimes fail with anatomical subtleties. Subtle details such as the shadows of nail beds on fingers or the fine vascular patterns in the lungs are often omitted or misrepresented by the AI, which can be an indication of manipulation.

The authors warn that the technical hurdle for creating deceptively realistic medical images has massively decreased. As they write, a simple text prompt today is enough to invent a bone fracture or a tumor that deceives even experts.

Read also

Example of an adenoma detected by EndoMind, outlined in blue

Colonoscopy study: AI polyp detection offers little help to experienced doctors

To ensure trust in digital radiology, the study's authors recommend a multi-stage security strategy. On the one hand, special training should be provided for radiologists to sharpen their focus on the subtle artifacts and inconsistencies of AI-generated images. On the other hand, experts consider the introduction of robust technical protective measures to be essential, with digital signatures, invisible watermarks, or blockchain-based provenance records guaranteeing the authenticity of medical images. These approaches should be complemented by the development of independent, automated detectors that can independently identify and reliably mark deepfakes in clinical practice through in-depth pixel analysis.