No more calculation errors? AI checks papers for errors – Why that's not enough

Researchers are developing AI tools to detect errors in scientific publications. However, this innovation also harbors risks and limitations.

listen Print view
Mathematical formulas containing errors that are corrected by one hand with a red pencil.

(Image: worker/Shutterstock.com)

4 min. read

The idea sounds impressively simple: artificial intelligence should find errors in the content of scientific publications – preferably during the review phase. Two promising projects have already achieved initial success, reports the scientific journal Nature.

The "Black Spatula Project" is an open source project that has so far analyzed around 500 articles for errors. The name of the project goes back to a specialist article that claimed that black plastic cooking utensils contained worrying amounts of carcinogenic flame retardants. The research work caused quite a stir – but it contained a simple calculation error. Although this was quickly discovered and has since been corrected, it initially led to numerous altruistic media reports.

The second project is called YesNoError. According to Nature, the AI tool has already analyzed more than 37,000 articles in two months since its launch. The articles in which errors have been found are listed on its website –, although most of them still need to be checked by humans.

The idea is similar in both cases: both the Black Spatula Project and YesNoError use Large Language Models (LLMs) to detect various errors in the articles, incorrectly cited facts, errors in calculations, errors in methodology and also errors in references to scientific sources.

This is actually what peer review is for, the examination of scientific articles by experts before publication. However, this process has been criticized for some time because it is rather slow, sometimes interest-driven – and does not find all errors.

Black Spatula is not yet a finished tool. The group working on the project is currently still testing which approach leads to the best results. To do this, the researchers are collecting freely accessible papers that have been proven to contain errors, prompts for various large language models and the corresponding outputs of the language models.

The researchers test the prompts with models such as OpenAI's o1 or Claude 3.5 from Anthropic. The cost of analyzing each individual paper ranges from 15 cents to several US dollars, depending on the length of the paper and the query sequence used. According to the Nature report for the Black Spatula Project, the rate of false alarms is around ten percent.

YesNoError claims to have agents that work with OpenAI's o1 and are trained using synthetic data to find specific types of errors – such as calculation errors. The proprietary tool then applies several of these agents in parallel to the paper to be checked and uses the language model to combine the results of the individual agents into a consistent overall result.

In order to be sure that the errors really are errors, the errors found by the AI tools must be checked by humans – preferably by experts in the relevant field, of course. Finding these experts is the biggest bottleneck of the project, Steve Newman, founder of the Black Spatula Project, told Nature.

YesNoError plans to solve this problem with the help of financial incentives. The project is working with ResearchHub, an online platform founded to accelerate digital publication and collaboration among researchers. In 2024, the platform introduced its own cryptocurrency to reward engagement in collaboration – reviewing a paper, writing code, etc. – YesNoError also wants to use this mechanism to check the results of its tool.

This article first appeared on t3n.de .

(mack)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.