AI in medicine: study shows shortcomings of commercial image analysis models
Various commercial AI models for analyzing medical image data can be influenced by written annotations on the images.
(Image: Kateryna Kon/Shutterstock.com)
Commercial AI models often do not yet achieve the accuracy in image analysis that would be necessary for clinical application. This is the conclusion reached by researchers who have investigated the use of various AI models to analyze medical image data. They found that prompt injection has a significant impact on the output of multimodal AI models. This occurs when additional text information acts as unintentional "prompts" that can influence the decision-making of the AI models. However, handwritten labels or watermarks are not uncommon on histopathology images.
For the study published in the journal NEJM AI, the researchers – led by the University Medical Center Mainz and the Else Kröner Fresenius Center (EKFZ) for Digital Health at TU Dresden (TUD) and other scientists – examined the Claude 3 Opus, Claude 3.5 Sonnet and GPT-4o models. However, these models were not specifically trained with histopathological data beforehand.
The researchers tested these models for their reaction to handwritten labels and watermarks on pathological images. If the additional information was correct, the output of the AI was almost always entirely correct. However, additional, misleading information seemed to cause the models to neglect their actual task and produce incorrect results. This is a particular challenge for histopathology, where handwritten notes or markings on the image data are more common.
Videos by heise
AI models that have been trained on both text and image data are particularly susceptible to these prompt injections. This is according to Prof. Sebastian Försch, head of the Digital Pathology & Artificial Intelligence working group at the Institute of Pathology at the Mainz University Medical Center. "If, for example, an X-ray image of a lung tumor is provided with a text that instructs the model to ignore the tumor, this significantly reduces the model's ability to correctly identify the tumor."
"For AI to support doctors reliably and safely, its weaknesses and potential sources of error must be systematically examined. It is not enough to show what a model can do – we must specifically investigate what it cannot yet do", said Prof. Jakob N. Kather, Professor of Clinical Artificial Intelligence at TUD.
Specially trained AI models are likely to be less prone to errors in response to additional text information. Accordingly, AI models and their results should always be validated by human experts before they are used in clinical practice. The team at the Mainz University Medical Center led by PD Dr. Sebastian Försch is therefore in the development phase for a specific "Pathology Foundation Model".
Doctors bear responsibility for diagnosis and treatment
At the 129th Medical Congress, doctors spoke out in favor of training their own AI models in order to avoid slipping further into dependence on large commercial providers from the USA and China. European and data protection-friendly solutions should therefore be developed. Another important point was that doctors should always have the final say in decisions on diagnosis and treatment.
(mack)