German Interoperability Day: How does AI affect standards?
At the German Interoperability Day, participants discussed the influence of AI on documentation and interoperability standards.
Dr Carina N. Vorisek from Charité at the 10th German Interoperability Day.
(Image: heise medien)
One of the focal points of the tenth German Interoperability Day (DIT) is the topic of "Artificial intelligence meets/eats interoperability standards". Experts also discussed the opportunities and limitations of AI systems in clinical practice. Dr Kai U. Heitmann from HL7 Germany began by emphasising the need for structured data: AI can only work reliably if medication, findings, and diagnoses are standardised. A lack of structure could lead to assistance systems making incorrect treatment decisions. Interoperability standards such as HL7 FHIR and SNOMED CT are crucial for enabling safe and traceable AI applications in the healthcare sector.
Dr Carina N. Vorisek from Berlin's Charité hospital focused on the structural problems behind many AI models in medicine. "This should by no means be an AI bashing today […]. Bias has such a negative connotation that we say this bias, which we don't actually want, stigmatises. […] But there is also this positive bias that we bring in differences and understand which people respond differently to therapies and diagnoses and why," she explained. Studies have shown that women, children, older people and people from lower-income countries are underrepresented in many medical data. This leads to algorithms delivering poorer results for these groups. A fair AI system must reflect diversity and take therapeutic differences into account.
Vorisek also presented a study that investigated how well large language models can encode medical data according to SNOMED CT. The result was that human experts continue to work more precisely. No model had achieved the quality of a human coder. Many even generated fictitious code, criticised Vorisek. Heitmann warned against blind trust in AI systems. Medical AI needs standards to process medical data correctly, safely and comprehensibly. It should not replace doctors, but help to provide better medicine for everyone and be fair, reliable, and transparent.
Videos by heise
"Do we still need FHIR?"
Dr Philipp Daumke from Averbis GmbH put forward a provocative thesis: "With LLMs, computers understand semantics first. And as hard as it is, this means that FHIR is simply no longer necessary at this point". He traced the development of the last 20 years: from rule-based expert systems, ontologies and big data methods to today's generative models. According to Daumke, LLMs "understand" medical meaning directly. If the same information is then structured again, sources of error rather than added value are created. He criticised the fact that there is no single FHIR standard, but rather a multitude of contradictory profiles, which increases complexity. However, AI can convert content into any required format depending on the situation – whether a doctor's letter, patient app or research database.
"Structure remains the basis"
Dr. René Hosch from the Institute for Artificial Intelligence in Medicine (IKIM) at Essen University Hospital countered this by stating that without a structured basis, no research, no scaling and no traceability is possible. Essen University Hospital operates one of the largest FHIR implementations in Europe with over two billion resources – a basis that makes research, care, and quality assurance possible in the first place.
Hosch showed, among other things, how the automatic generation of FHIR from free text is possible, the data analysis based on FHIR and a semantic search dashboard that makes clinical documents accessible to doctors via AI queries. He sees standards as a way of "further structuring the unstructured world". AI could help to structure texts – but only standardized formats made the results interoperable, validatable and testable. He pleaded for "small-language models", specialized systems for clearly defined medical applications that can also be operated with limited hardware: In his opinion, there is no need for billion-dollar models, but rather practical models with a clinical focus.
Agents as a bridge between man and machine
Julius Severin from the Danish company Corti, which develops AI-supported speech and documentation systems for the healthcare sector, began his presentation by pointing out that medical staff spend around 35 percent of their working time ondocumentation. Corti originally supported emergency call centers in the detection of strokes and heart attacks and has since focused on automatic call recording, fact recognition and coding of medical content. Corti wants to create a controllable intermediate level in which the human remains in the loop and retains control.
Between the transcript and the finished report, "atomic facts", i.e. elementary units of information, are first extracted that are machine-readable but can be checked by humans. Based on this, specialized agents can take over final tasks – such as coding according to ICD or SNOMED or filling out digital forms via FHIR interfaces. According to Severin, these modular, comprehensible systems should manage "the balancing act between automation and responsibility".
Overall, the participants agreed that AI will change everyday clinical practice, which should take place under clear framework conditions. "From ten simultaneous enquiries, A100 in the basement is no longer enough," said Daumke. This is when Averbis will fall back on Azure services with European data storage or German cloud partners such as Stackit. On-premise variants would quickly reach their technical limits as soon as many users were working in parallel. In this context, Hosch also described the development that locally running AI already delivers "relatively useful results" as significant.
Regarding large or small models, Hosch defended specialised small language models, as they ensure greater efficiency, data protection and control. On the other hand, it was argued that language models from commercial providers often deliver better results, but that hybrid strategies could be used, for example for preliminary decisions on which model is responsible for a task. It was therefore particularly important to "orchestrate", whereby several models or agents work together and exchange information. There was also consensus that standards such as FHIR are not losing their value, but are changing their role. Where previously each data field had to be assigned manually, AI models can now recognise and structure content and store it in the appropriate standard.
(mack)