Ontologies in medicine: structure and creation

On the structure and challenges of ontologies and terminologies using the example of medicine.

Save to Pocket listen Print view
Library with an open book containing a stethoscope.

(Image: Chinnapong/Shutterstock.com)

14 min. read
By
  • Dr. André Sander
Contents
This article was originally published in German and has been automatically translated.

The building blocks of ontologies are terminologies, i.e. collections of technical terms in a specific domain (the "technical language"). It is important to note that a "term" in the terminological sense is not a single word, but a concept in the sense of an intellectual unit. Such a term can have several descriptions (also called a label or term). This is particularly helpful in medicine, as there are often German, Greek and colloquial terms in addition to the Latin ones. The medical term "mumps" (from English) can also be referred to as "Ziegenpeter" (German), "Salivitis epidemica" (Latin) or "(Bauern)Tölpel" (colloquial).

An ontology now describes the relationships between the terms in a terminology. When a doctor hears the term "mumps", a contextual space is mentally formed that contains the symptoms of mumps (swollen parotid glands), possible therapies (antipyretic agents), risk factors (immunosuppression), concomitant diseases (pancreatitis, anemia) and the like. This associated knowledge is mapped in an ontology.

So-called "descriptive logics" (DL) are used for this purpose. These formalize the mapping of knowledge and ultimately enable algorithmic use, such as logical reasoning. Descriptive logics are a group of formal logic languages that have different strengths of expression, i.e. they can represent different statements. The languages are divided into three groups:

All languages have in common that at least basic logical expressions, such as "or" and "it exists", are supported. If additional features are added to one of these languages, these are coded as letters and appended to the basic language. The possibility of linking terms with "and" is denoted by "U", for example. If hierarchies can be mapped, an "H" would be added. An "ALCUI" language (abbreviated as "SUI") is therefore an attributive language with which "and" (U), "not" (C) and so-called inverse properties (I) can also be mapped.

The latter are an interesting way of turning questions around: A bone can break, but what else can (still) break? Other important extensions are cardinalities (N,Q – to represent "hexadactyly is a hand with six fingers") and hierarchical roles (H –, for example, hair loss is a common side effect of beta-blockers and a common side effect is a side effect; therefore hair loss is a side effect of beta-blockers). The syntax with which the elements of the language are expressed should be familiar from computer science studies:

∀ (for all), ∃ (it exists), ∪ (or), ∩ (and) and a few others.

As a final example, here is a possible formal definition of mumps:

Mumps ≡ ∃Trigger.MumpsVirus ∩ ∃Pathology.Infection

Mumps is therefore present if there is an infection with a mumps virus.

Definition of the term "mumps" in SNOMED CT

(Image: SNOMED 2024 International)

In practice, ontologies are usually defined and exchanged using the Ontology Web Language (OWL) and can be edited on this basis using open-source tools. The OBO standard, in which the Open Biomedical Ontologies were originally defined, has also become established in medicine.

There are around one thousand medical ontologies that are more or less formally defined –, i.e. they are at least available in OWL or OBO format.

Many ontologies are specialized in subdomains and only represent certain parts of medicine. Content overlaps can be seen as perspectives, as the ontological modeling depicts different aspects. Merging this information greatly expands the context of the content, but sometimes requires complex tools that can map terms from different ontologies to each other (so-called terminology servers).

Name Domäne Begriffe
[in Tsd.]
SNOMED CT
Systematized Nomenclature of Medicine Clinical Terms
Alle Bereiche der Medizin bis hin soziodemographischen Aspekten 400
LOINC
Logical Observation Identifiers Names and Codes
Vital-, Labor-, und Mess-werte (bis hin zu Geräten) 285
FMA
Foundational Model of Anatomy
Anatomie 105
GO
Gene Ontology
Gene 51
RADLEX
Radiology Lexicon
Radiologie/Befundung 45
HPO
Human Phenotype Ontology
Phänotypisierung 21
ORPHA/ORDO
Orphanet Rare Disease Ontology
Seltene Erkrankungen 15

The advantages of using rule-based AI based on ontologies are obvious: the "training" takes place in the form of a manual definition and is carried out by humans, which means that the training material itself is, at best, a single textbook. This advantage becomes particularly clear when you consider that many diagnoses are rare in medicine. Spontaneous Creutzfeldt-Jakob disease is probably not diagnosed ten times a year in Germany – training a machine learning algorithm to recognize this diagnosis is therefore extremely difficult. The situation is similar with pathogenic bacteria: in percentage terms, only a few of the known species are responsible for the majority of all bacterial infections.

In an ontology, it is not relevant whether a contained term occurs frequently or rarely. In addition, the algorithms work transparently, and the results can be visualized in such a way that they are comprehensible at all times. Individual errors in the ontology can be corrected in a targeted manner without having to update or revise the entire system. But there are also disadvantages: Human training is time-consuming, expensive and, to a certain extent, subjective. In addition, specific errors can occur that make it impossible to use (inference) algorithms: Circular reasoning leads to endless loops and contradictory statements lead to unwanted terminations.

The creation of an ontology is a complex, intellectually demanding and by no means trivial process. The central building block of an ontology is the so-called "concept model", which must be explicitly defined and should not result from the definition of knowledge. The concept model specifies which (semantic) roles may be used and when a term is fully modeled or defined.

The following is a series of typical and partially overlapping problems that must be considered when creating ontologies:

"Medicine is big and complicated", wrote Alan Rector in the early 2000s. And this is mainly reflected in the almost 400,000 terms in SNOMED CT, which are linked to each other with over 1.5 million relations. The 1 GB of raw data is certainly not too much of a challenge for today's computer systems - but it is the algorithms that have to work efficiently and quickly on such large networks. Maintaining such large systems is also extremely time-consuming: The effects of changes sometimes extend over hundreds to thousands of terms.

At what level should a terminology or ontology end? In humans, physiology is typically considered down to the level of proteins and molecules and, in the context of biochemistry, extends deep into chemistry itself. So where should the relationships be mapped to? In SNOMED CT, many substances and their effects are mapped, but no further relationships, such as indication or contraindication. This quickly takes you into areas that lie outside human medicine.

If, for example, you look at zoonoses, i.e. diseases that can switch between humans and animals, then the corresponding hosts must, of course, be included in the terminology. But is it important to also describe their symptoms? There are certainly good reasons for this, as it could potentially facilitate diagnostics in humans. A clear specification for the range of an ontology can hardly be made. The granularity should be based primarily on the use cases.

Medical terms can sometimes become quite complex - this applies not only to the terms and designations themselves (e.g. oligoasthenoteratozoospermia), but above all to the concepts behind them. The "arterial switch operation", for example, described with three basic words, is an extremely complicated operation in which the pulmonary artery and the aorta are disconnected from the heart and then reconnected. The "Whipple operation" is defined as "Partial duodenopancreatectomy with partial resection of the stomach (as well as the gallbladder, distal bile duct and gastric antrum)". Even the alternative description is complex. Such terms sometimes also contain temporal and causal relationships that may not be represented within the ontology.

Terms can be mapped both pre- and post-coordinated with or in ontologies. For example, the term "acute myocardial infarction" consists of two components: the diagnosis and an assigned attribute. You can now add the entire term to an ontology in the same way (pre-coordinated) or you can add the two individual components and only connect them during the individual mapping (post-coordinated). The advantage of post-coordination is that significantly fewer terms have to be added. The disadvantage is that the mapping requires a special syntax (e.g. "Expression Constraint Language").

Here is another negative example from SNOMED CT: "Motor vehicle nontraffic accident involving fire starting in motor vehicle, except off-road motor vehicle, while in motion, not on public highway (event)". Such a term certainly does not belong precoordinated in an ontology. A good indicator is the question: can semantic roles be assigned to the term that only apply to this term?

The hope that the use of ontologies will lead to universal, semantic interoperability is understandable, but cannot be fulfilled. Each term is based on a formal definition that is partly culturally, partly legally and partly scientifically influenced and therefore does not apply universally. For example, maximum doses of medication can vary from country to country and medical definitions of new diseases in particular are often not standardized. The definition of stillbirth in particular differs from country to country. While Germany considers the birth weight (<500g) in addition to the actual clinical death, the time of birth (before the 21st or 25th week of pregnancy) also plays a role in countries such as the USA or the UK. In Russia, healthy children who die within the first week of life are counted as stillbirths. Comparative statistics are therefore difficult and not possible on the basis of terminology alone.

In general, ontologies raise the question of how detailed the respective domain should be mapped. Or in other words, how detailed the world can be mapped with the respective ontology. Similar to granularity and bandwidth, it also makes sense to map peripheral areas of medicine - animals or, for example, pollen as a trigger for allergies. But which animals should be included? Is the common domestic cat enough, or does it also have to be a Peterbald cat? Is birch pollen in general enough, or do we need Himalayan birch pollen?

In principle, the answer to this question is simple: If there is a medical use case, then the term must be included. If allergies caused by Himalayan birch pollen differ from those caused by non-specific birch pollen, the term should be included.

When specifying the details, it should also be noted that special terminology can be used for narrowly defined domain contexts. In medicine, for example, there is the Foundational Model of Anatomy Ontology (FMA). This maps the human anatomy down to the smallest detail (each tooth has its own nerves and blood vessels with their own names). Another example is the GeneOntology, which describes all genes and many other highly specialized ontologies.

Finally, the "concept model" of an ontology provides a further framework for detailing. If the role "Has color" exists in it, then this should also be maintained. This can mean an enormous amount of work, lead to a high level of complexity, and ultimately have little benefit. It must therefore already be clear during the development of the concept model what the ontology is to be used for.

Not everything belongs in an ontology
French fries with mayo and ketchup

Even "Pommes Schranke" can be mapped with SNOMED CT.

(Image: Generiert mit playground.com durch André Sander)

If you have a look around in large ontologies, you will find amazing things: SNOMED CT, for example, not only contains hundreds of weapons (including thermonuclear bombs), but also almost 2000 species of fish - even deep-sea fish, which are extremely unlikely to come into contact with humans. In addition to UFOs, there are also at least seven terms relating to different types of mobile homes. Even if motorhomes are currently all the rage, this is difficult to understand in medical terms. The situation is different with socio-demographic terms, which are becoming increasingly important for algorithms. However, you can imagine what it means when domains such as income, education, leisure activities (i.e. motorhomes after all!) etc. are integrated into an ontology.

Closely linked to the expressiveness of an ontology are abnormalities that need to be mapped. In the case of congenital diseases, for example, openings may be closed (atresia) or anatomical structures may not be formed at all and are therefore missing (agenesis). There are also many abnormalities, such as dextrocardia, in which the heart is located on the right side of the body, ectopias, in which organs are located outside their intended location, or additional body parts, such as hexadactyly. Depending on the DL used, these special features can be depicted more or less well.

Visualization of a missing rib in SNOMED CT

(Image: André Sander)

Similar to the previous point, idioms are "human characteristics". However, they do not refer to the object that is to be described, but to the user who describes the object. This refers to terms that are used colloquially and usually incorrectly. The best-known example is "appendicitis", which is used synonymously with "appendicitis". However, the "appendix" is not the "appendix", but is referred to by doctors as the "caecum", the last part of which is the "appendix". This in turn is called the "appendix" in German. In this respect, the correct German term for "appendicitis" would be "appendicitis".

Another classic example, cited by Alan Rector, is "endocrine surgery", which is understood by physicians as surgery on the endocrine organs. However, both the male and female reproductive organs are endocrine organs. However, no doctor would refer to such operations as "endocrine operations". SNOMED CT is somewhat undecided and includes operations on the female reproductive organs (ovaries) as "endocrine operations", but not those on the male reproductive organs (testicles).

Some idioms are certainly also culturally determined and therefore regionally limited – this makes mapping in ontologies largely impossible. Nevertheless, acceptance by users can only be achieved if such particularities are also considered.

Note: Part 3 of this series describes some typical use cases of terminologies and ontologies.

About the author: Dr. André Sander completed his doctorate in medical sciences at the Charité and has been working with medical terminologies and ontologies for more than 25 years. At ID Information und Dokumentation he is CTO, authorized signatory and member of the management board.

(usz)