Ontologies & terminologies: How language can be formalized in medicine

In medicine, AI-related mistakes can lead to serious consequences. But there are technologies that can prevent such mistakes.

Save to Pocket listen Print view
Head of a human being

(Image: Triff/Shutterstock.com)

6 min. read
By
  • Dr. André Sander
This article was originally published in German and has been automatically translated.

In critical areas, such as medicine, the integration of artificial intelligence is anything but trivial, as AI-related wrong decisions can have serious consequences. However, there are technologies that can make current AI models more reliable and safer.

In times of corona, everyone was suddenly talking about "incidences" and "R-values" as a matter of course – today it is "LLM", "transformer architectures" and "generative models" that are on everyone's lips. Above all, however, a whole range of very different technologies are reduced to these two letters. What we generally refer to today as "AI" is just one of several approaches that can be used to implement artificial intelligence.

AI in the EU

The EU categorizes artificial intelligence as follows

  • Machine Learning
  • Reasoning
  • Robotics

Slightly more differentiated at the implementation level:

  • Machine learning concepts
  • Logic and knowledge-based concepts
  • Statistical approaches

The technologies do not compete with each other, but rather complement and in some cases even depend on each other. This is particularly evident in the omnipresent problem of hallucinations of LLMs. One common solution is simply: rules. Incidentally, this not only addresses hallucinations, i.e. false statements, but also problems such as racism and chauvinism in the models' responses.

In medicine – in contrast to machine learning models – knowledge is stored in terminologies and ontologies. The original idea of ontologies is – as the name suggests – a development of philosophy, especially metaphysics. It is about dividing the world into the real and the possible. Although the roots go back some 2500 years, the actual concept was only taken up and defined in the last 500 years by philosophers such as Goclenius, Hegel and Kant.

History of ontology

The aim of an ontology is the structured representation of the world and "to explicate, through conceptually based deduction, all those determinations that can be attributed to beings as such and that are therefore of the highest generality" (Christian Wolff) - in other words, nothing other than to represent knowledge and to generate or imply knowledge from it.

With the emergence of computer science as a science, this idea was taken up at an early stage and has been studied and adapted since the 1950s. Initially, so-called semantic networks were introduced in the 1960s, driven primarily by linguists. These networks mapped relationships between things and their attributes at a fairly low level of abstraction.

A short time later, in the early 1970s, the construct of "frames" was proposed, in particular by Marvin Minsky, which mapped knowledge in small, defined units. This laid the technical foundation for rule-based artificial intelligence.

It quickly became clear, however, that a certain formalization was necessary in terms of computer science. The languages that were developed to represent knowledge are called description logics and are based on formal logic. The first knowledge systems implemented on this basis were created in the second half of the 1980s. The current state of the art was summarized by Baader et al. in 2003 in his "Description Logic Handbook" and is now considered a standard work.

Terminologies have long been a means of structured documentation in medicine. Primarily classifications – a special form of terminology, which are mainly used for aggregation into classes – have been used since the end of the 19th century. Initially conceived as mortality statistics (for analyzing causes of death), the International Classification of Diseases (ICD; full name "International Statistical Classification of Diseases and Related Health Problems") has evolved to reflect morbidities (diseases) and today forms an important basis for controlling the financing of healthcare systems in many countries around the world.

While large parts of the ICD have been removed over the course of the versions –, primarily the so-called external causes (for example, very detailed causes of death due to war measures) –, other areas have been presented in ever greater detail. In particular, the so-called widespread diseases, especially diabetes, can now be mapped very precisely. This is since the ICD is used as the basis for the distribution of the health budget.

The ICD was updated irregularly approximately every five to ten years and is now in its 10th routine revision. At present, this classification is also undergoing a change to a terminology and has therefore undergone a considerable structural change as version 11. Hierarchies such as those shown in the illustrations using the example of diabetes is no longer possible, without further ado.

The example of "diabetes" vividly illustrates the problems that arise for computer scientists: Quite a lot of volatility in the data and structures. Classifications are not fixed works, but evolve with the state of knowledge in medicine and - it is important to internalize this - with the intended use. In principle, each version of the ICD can be understood as an independent work, and yet continuity must be ensured. From the point of view of medicine, epidemiology, financing and actual care, there should or must be no breaks when classification systems change. These should be evolutionary, "flowing" processes - even if there is often talk of "revolution".

Classification in transition (4 Bilder)

Excerpt from the ICD-6 from 1948/1951 (creation, publication). There was exactly one code (260) for the entire clinical picture of "diabetes". (Bild: Bundesinstitut für Arzneimittel und Medizinprodukte)

Citizens in Germany certainly come into contact with classifications, especially ICD-10: be it on a sick note or the Federal Clinical Atlas (in the version with "free search"). Both contain terms that originate from the ICD-10. For example, if you visit your GP with a typical respiratory infection, you will usually find the code "J06.9" on the sick note. This stands for nothing other than "Acute upper respiratory tract infection, unspecified". Incidentally, the version for the employer does not contain this code - so the employer does not know why an employee has been written off sick.

Example of a certificate of incapacity for work in the version for the insured person. The ICD-10 code "K40.90" stands for "hernia", the G for "secured" and the R for "right".

(Image: André Sander)

In the Bundes-Klinik-Atlas, a diagnosis or treatment must be selected when entering a search term. The diagnoses have an additional coding - these are also the same ICD-10 codes. A different but similar classification is used for treatments (the operation and procedure code, OPS for short).

Excerpt from the Federal Clinic Atlas in the version with free search: when entering "hernia", the user receives a list of suitable ICD-10 codes (in brackets).

(Image: BMG)

"End users" (insured persons, patients, but also specialist staff) have little or no contact with actual terminologies and ontologies. Nevertheless, they are already making an important contribution to the digitalization of the healthcare system and are increasingly being used in modern standards – even being mentioned by name in laws. The Digital Act (DigiG), for example, regulates the "right to interoperability" (Section 386) and describes in detail which data must be transmitted in a semantically and syntactically interoperable manner.

Today, anyone who fills an e-prescription, receives an electronic certificate of incapacity for work or downloads an electronic patient file or vaccination record onto their smartphone uses terminologies in the background.

Excerpt from the specification of a vaccination-relevant diagnosis from the MIO vaccination record of the KBV. Coding is carried out using an ICD-10 and a SNOMED-CT code.

(Image: Simplifier.net)

Note: Dr. André Sander completed his doctorate in medical sciences at Charité and has been working with medical terminologies and ontologies for more than 25 years. At ID Information und Dokumentation, he is CTO, authorized signatory and member of the management board.

(are)