AI in healthcare: "Hardly any data set is free from bias"
A solid database is of great importance for AI training, especially in the healthcare sector. Theresa Ahrens explains in an interview what is still lacking.
(Image: ArtemisDiana/Shutterstock.com)
With a comprehensive and diverse database, better results can be achieved when training AI systems in the healthcare sector. A database that does not represent the entire population or target group leads to biased AI. Theresa Ahrens from the Digital Health Engineering department at Fraunhofer IESE explains in an interview why balance is important and what other options are available.
(Image:Â Fraunhofer IESE))
heise online: One-sided data sets are problematic. Why is that the case?
Gender in particular and aspects such as ethnic origin are sources of AI bias. But it can be said that there is hardly any data set that is completely free of bias. We simply have to be aware of this. The data that is available in the health sector is mainly that of heterosexual, older, white men. Women, children, people of color – are all underrepresented.
Time and again, studies show that decisions made by AI systems for these groups of people in the healthcare sector are significantly worse. Medical research was and still is heavily biased towards men. The bias in the data basis is then of course automatically transferred to AI systems and their recommendations.
The task of research is then to investigate the bias resulting from the distorted data basis and to set up the AI systems as well as possible and normalize the data sets.
heise online: How is normalization then carried out?
First of all, it must be emphasized once again that the goal should actually be to have a database that is not biased. However, if it is discovered that there are systemic distortions, various approaches can be taken to reduce them. For example, synthetic data sets can be generated and underrepresented population groups can be supplemented with realistic data. In addition, new methods are still being developed as this problem is common and challenging.
Is it also possible that the AI is being trained too much with the same data?
That is certainly possible. In this case, the training data set is not optimally aligned with the target group and does not represent it. Another phenomenon here can also be overfitting. In this case, the AI systems have been trained too much on the data set. You must always keep an eye on overfitting and make sure that the training data set and the AI training itself are aligned with each other. These AI systems often fail when realistic data from everyday medical practice is used for the first time. For example, this data may have more background noise or deviate in other ways. Therefore, the data sets for AI development should always reflect the data used in routine use as accurately as possible.
Who is responsible?
It is the responsibility of researchers and AI manufacturers to monitor AI systems and ensure quality management. Approving authorities should also carry out appropriate checks.
There are currently great efforts to collect as much data as possible for AI systems. How long do you think it will take until enough data is available?
I can't say. But the way things are going now, I would assume that I won't benefit from it in my lifetime –, especially because time series are often required. This means that data is needed from one person over several years. This type of data is needed for predicting diseases, for example. I would therefore strongly advocate accelerating these projects. A lot of data is collected, but most of it is stored in silos and is not accessible.
There are plans to transfer data from the electronic patient file to the Health Research Data Center from mid-2025. Is that a glimmer of hope for you?
Yes, it is, and I am also impatient. But even then, we won't have data points until 2025. In the medical field, longitudinal studies are often carried out over a lifetime and preferably over generations. It would then be particularly interesting to obtain health data from families. In this respect, the Health Research Data Center is definitely a step in the right direction.
Denmark, Norway and Sweden already have national databases that are much more advanced. In situations like the coronavirus crisis, this data can be analyzed more quickly and the effects of measures can be better assessed. Here, for example, there was an interesting study relatively quickly that extreme premature births and stillbirths were reduced during the lockdown.
The European Health Data Space is also set to bring this about, with data being stored for up to 100 years. Will that help?
There is still a long way to go. But the EHDS is definitely an important step at European level, especially for international cooperation on health issues. However, the European health data is not necessarily transferable 1:1 to Germany. For example, a recent study shows that the so-called Portosystemic Hepatic Encephalopathy Score (PHES) is not directly transferable between Germany and Denmark and new Danish standard values have been defined. This example shows that there are differences, even if the population groups are supposedly very similar.
Comparisons between countries are sometimes helpful, but there are also simply differences of a cultural nature. In Norway, for example, people are incredibly active and spend more time outdoors, which naturally has a positive effect on their health. Diet is also a factor, but other living conditions such as the climate are also decisive. How long do people spend in the sun and in which places? What kind of preventive care is offered by health insurance companies? What treatment recommendations are there or what medication? This varies from country to country and even from health insurance fund to health insurance fund in Germany.
Videos by heise
There is also currently a debate whether developed AI systems can simply be transferred between different healthcare systems, for example from the United States or Asian countries, to Europe – because there are cultural differences or the healthcare systems are different. In regulatory terms, these systems are separate anyway. This means that they have to go through the approval process again. But the AI outputs have to be quality-assured again for the new target group.
You definitely need a good national database, but you can also benefit greatly from international data. The more data is available, the lower the potential AI bias. So far, however, the data situation in the healthcare sector in Germany is rather miserable. Getting to grips with this is a mammoth task. Merging, standardizing and harmonizing the data are also challenges. However, the potential of the data volumes clearly outweighs this.
Would it be enough to do more advertising for research projects?
Science certainly needs to take a step towards society here and also push ahead with science communication, also to reduce data protection concerns. In any case, data donations are also an important building block. Here too, quality assurance of the data or appropriately adapted data management in the projects would be important.
Different data sets are required depending on the research question. The research question must be very well thought out. Diverse teams also help, for example, if the first female crash test dummy had not only recently been created. The diversity of society must be considered – This is possible with a correspondingly diverse database and diverse research teams.
What about the quality of the data?
In the case of professionally managed medical registers, quality is ensured by the operators. In the case of data from electronic patient records and the European Health Data Space, the quality will probably vary greatly between individuals or countries, especially at the beginning.
From a societal perspective, it would be helpful if people consider what they upload to the EPR and also have the social benefits clearly communicated to them. It would be ideal if data collection in the ePA were integrated into the various processes as automatically as possible. Filling the EPR must not become an additional burden for patients or the various healthcare professions.
Which data sets that are already available do you use? What research projects are you planning?
The Mimic dataset (MIMIC-III Clinical Database v1.4) for intensive care patients, for example, is very well structured and is frequently used internationally. We also have access to our own research. There are also corresponding intensive care data sets from Europe. This is because a lot of data is generated in intensive care units, as patients' vital signs are monitored extensively and continuously. However, this also shows that this routine data and, above all, data access are very valuable for research.
(mack)