Chatbots in medicine: More incorrect advice after typos during input
When AI technology is used to evaluate descriptions of complaints, it often produces incorrect advice if there are spelling mistakes, for example.
(Image: Pixfiction/Shutterstock.com)
AI chatbots that are supposed to give people simple medical advice are noticeably influenced by spelling mistakes, superfluous spaces or “unsafe, dramatic or informal” expressions. A research group from the Massachusetts Institute of Technology in the USA found this out and considers it to be strong evidence that such systems need to be checked more rigorously before they are used in this way – but this is already happening. The study shows that AI technology processes nonmedical information and incorporates it into the generated texts in a previously unknown way. In addition, there is a higher error rate compared to women.
More incorrect advice after spelling mistakes
As the team explains, they used AI to generate thousands of minimally modified messages from patients for their work, which they then had evaluated by conventional chatbots such as GPT-4. These were then supposed to answer whether the person in question should stay at home or be called in, or whether a laboratory test might be necessary. As soon as certain errors were incorporated into the evaluated messages, which correspond to how people normally communicate, it was noticeably more often recommended that they should take care of it themselves – which would have been a mistake in those cases.
Videos by heise
The research team also noticed that the chatbots recommended that female patients receive treatment at home on their own much more frequently than men. If you only checked the accuracy of the information provided, you would not notice some “worst results” in which staying at home is recommended despite serious ailments, the team warns. While such errors could have extremely problematic consequences, this does not apply to errors in the other direction – i.e., when patients are called in when this is not necessary. In a statistical analysis of accuracy, however, such cases could ensure that others are overlooked.
The team points to the fact that chatbots are often only trained and tested with texts from medical examinations as a possible cause of the errors. In use, however, they then come into contact with texts that are very far removed from this. This is another reason why they need to be better tested before they are used in medicine. The group now wants to further investigate how AI deals with natural language used by certain population groups. They also want to find out how the technology deduces the gender of people from texts. Their study can be viewed online.
(dmk)