Questionable "research" from generative AI on Google Scholar
Secretly AI-generated "research results" carry the risk of undermining science, Swedish researchers warn. This threatens society.
(Image: RitaE, gemeinfrei)
Swedish researchers are sounding the alarm: secretly AI-generated “research results” are appearing in Google Scholar, various databases and even peer-reviewed journals. The abundance of false information could overwhelm quality control in the sciences and thus jeopardize the integrity of the records of scientific findings, the Swedes warn.
Secondly, generative artificial intelligence could be used to deliberately create misleading documents that appear convincingly scientific and are also optimized to be prioritized by public search engines, especially Google Scholar. This possibility undermines trust in the sciences and poses a serious threat to society. Finally, false “results” could be placed to mislead a society or its decision-makers into making certain decisions.
Obviously faked documents investigated
A group of three scientists from Borås University of Applied Sciences and one from the Swedish University of Agricultural Sciences took a random sample of allegedly scientific documents from Google Scholar. They set the bar very low: They only downloaded the document if at least one of two error messages occurred, as is typical for generative outputs of GPT versions 3.5 and 4.
Videos by heise
These telltale outputs were “as of my last knowledge update” and “I don't have access to real-time data”. The researchers downloaded 227 such search results via Google Scholar. They were able to eliminate 88 of these because they either disclosed the use of generative AI or their use was otherwise legitimate. This left a sample of 139 documents that were at least partially secretly AI-generated and whose disseminators lacked even the most basic due diligence in concealing their approach.
Analysis of the sample revealed that almost one in seven of these documents had even appeared in a reputable scientific journal. Almost two thirds appeared in other scientific journals. Just under a seventh were student papers from databases at various universities, and only a small proportion were working papers. In terms of content, the topics of computer science, the environment, and health dominated. According to the results of the investigation, the leading topic of the fake research examined was fish and their breeding. The researchers found several of the dubious works elsewhere, for example on Researchgate, IEEE, various websites and in social networks.
Suggestions
The Swedish researchers do not have a simple solution either. They consider simultaneous approaches to technology, education, and regulation to be necessary. It will not be enough to recognize fraudulent works; it is also important to understand how they reach their audience and why some remain in circulation.
It would therefore be helpful if search engines offered filtering options, for example according to certain classes of scientific journals or peer-reviewed sources. The search index would have to be created transparently and be based on scientific criteria. “Since Google Scholar has no real competitor, there are strong reasons to establish a freely available, general scientific search engine that is not operated for commercial reasons but in the public interest,” the authors recommend.
Primarily “not a technical problem”
“It is important not to present this as a technical problem that exists only because of AI text generation,” they add. Rather, the issue should be addressed in the context of the “broken” scientific publishing system and ideological battles over the control of knowledge. The proposals also include raising awareness of the issue, particularly among decision-makers and multipliers such as journalists.
The Swedish study [i]GPT-fabricated scientific papers on Google Scholar: Key features, spread, and implications for preempting evidence manipulation[/i] was peer-reviewed and published in the Harvard Kennedy School's journal Misinformation Review in September. The aim of the study was not to statistically record the issue, but to point to the tip of the iceberg. “Our analysis shows that questionable and potentially manipulative papers fabricated with GPT are permeating the research infrastructure and are likely to become a widespread phenomenon,” the Swedes write, “Our findings underscore that the risk of false scientific papers being used as maliciously manipulative evidence must be taken seriously.”
(ds)