Colonoscopy study: AI polyp detection offers little help to experienced doctors

A randomized study in German private practices shows: AI-assisted polyp detection does not improve the adenoma detection rate of experienced gastroenterologists.

listen Print view
Example of an adenoma detected by EndoMind, outlined in blue

EndoMind marks detected adenomas with a blue rectangle during colonoscopy.

(Image: Uniklinikum Würzburg)

5 min. read
Contents

In colonoscopy-based colorectal cancer screening, AI systems are intended to help detect more polyps. A new randomized controlled study from Germany now comes to a sobering conclusion: In the clinical practice of experienced gastroenterologists, computer-aided polyp detection offers no measurable advantage.

The multicenter study, published as open access in the journal npj Digital Medicine, investigated the EndoMind system developed at the University Hospital Würzburg in five German private practices. This design was chosen because a large proportion of screening colonoscopies in Germany are performed on an outpatient basis in gastroenterological practices, not in academic centers. Between November 2021 and November 2022, 914 patients were randomly assigned to examination with or without AI support. All ten participating endoscopists had more than ten years of experience and had each performed over 10,000 colonoscopies.

The central finding: The so-called adenoma detection rate (ADR) – the proportion of examinations in which at least one adenoma is found – was 34.5 percent in the AI-assisted group and 32.9 percent in the control group. The difference of 1.6 percentage points was not statistically significant (p = 0.656). According to the study, none of the secondary endpoints showed any significant differences either: neither the general polyp detection rate, nor the detection rate for serrated lesions, the number of adenomas per examination, or the withdrawal time of the endoscope differed between the two groups.

The EndoMind system used employs a YOLOv4 architecture for real-time object detection, trained on over 506,000 manually annotated images. The system marks detected polyps in real-time with a bounding box, with the median time to first detection being 130 milliseconds. The rate of false-positive detections was only 2.2 percent. In a previous pilot study, EndoMind had shown performance on par with commercial CADe (Computer-Aided Detection) systems. Technically, the system should meet all requirements, so the lack of clinical effect cannot be attributed to faulty software.

Despite the unexpected result, Alexander Hamm, Professor of Digital Transformation in Gastroenterology at the University Hospital Würzburg and co-author of the study, said: “We are pleased that with this work we were able to show that an AI for colorectal cancer screening could be built using outpatient data from Germany in a university setting, and that it was tested where screening is performed every day: in specialized gastroenterological practices.”

Videos by heise

The authors identify several factors for the absence of a significant effect. Firstly, the observed ADR in the control group was 32.9 percent, significantly higher than the 25 percent assumed during study planning. The better the doctors are without AI, the less room there is for improvement – a ceiling effect. Furthermore, the participating physicians were all highly experienced. The study was designed for a 9 percent point increase in ADR; to statistically demonstrate an observed small effect, if any, over 6,000 patients per group would have been necessary, according to the authors' estimates.

These findings are part of a growing body of critical evidence. A comprehensive meta-analysis from 2024, which evaluated 43 randomized controlled trials, did find a statistically significant increase in ADR through AI – but rated the quality of evidence as “very low” and pointed to significant publication bias. Recent RCTs from Japan and the USA also showed no significant differences in experienced endoscopists. Accordingly, the American Gastroenterological Association (AGA) does not issue a recommendation for CADe systems in its current guideline , while the European Society of Gastrointestinal Endoscopy (ESGE) only issues a “weak recommendation.”

Two articles that heise online has already published contribute to a differentiated assessment of these results. In an interview with heise online, Alexander Hann already described the challenges of AI polyp detection under real-life conditions last year. He emphasized the gap between promising study results from academic centers and the daily practice of experienced general practitioners.

The debate becomes particularly explosive with a study published in the journal The Lancet Gastroenterology & Hepatology on the so-called deskilling effect, which heise online has also covered. In it, researchers at four medical centers in Poland studied 19 experienced endoscopists – all with at least 2,000 colonoscopies and an average of 28 years of professional experience. After only three months of regular AI use, the ADR of these doctors without AI support dropped from 28.4 to 22.4 percent – a statistically significant decrease of 6 percentage points. 15 out of 19 doctors showed a decline. The authors spoke of a “silent erosion of fundamental skills.”

The team around Alexander Hann is also currently intensively investigating the “benefit or harm [of AI support] for colleagues who are in training.”

(vza)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.