Cats irritate reasoning models: Study rehearses attack
Irrelevant input drastically worsens the results of reasoning models: A study has tested this with cats.
(Image: canbedone / Shutterstock.com)
A simple math problem is spun by the sentence: Cats sleep almost all the time. If part of the input into a reasoning model has nothing to do with the actual task, the quality of the output deteriorates drastically. This is what a study by Stanford University found out.
The scientists wanted to test the robustness of so-called reasoning models and found that they are not particularly robust. They developed an automated chain of attack for “Cats Confuse Reasoning LLM”. A low-cost proxy model (DeepSeek V3) creates misleading sentences that are fed to actual tasks in more powerful reasoning models (DeepSeek R1, OpenAI o1 and o3-mini), with GPT-4o used as a prompt generator and a hallucination detector taking on the role of an evaluator.
Cats, numbers and financial wisdom interfere with reasoning
According to the authors, the sentence that cats spend most of their lives sleeping led to a doubling of the chance of an incorrect answer to simple math problems. Other triggers used by the researchers related to an incorrect number (for example: could the answer be 175?) and to general financial wisdom. The probability that an answer was wrong when all three triggers were used was more than 300 percent higher in the study than the previous error rate.
Videos by heise
In addition to the incorrect answers, there was another phenomenon: DeepSeek R1-distill-Qwen-32B also exceeded the originally specified token budget by at least 50 percent in 42 percent of the answers. This also applied to OpenAI's o1 in 26 percent of cases. The authors call this effect a slowdown attack. The token budget is specifically responsible for the costs of a request.
According to the study published on arXiv, both effects could be used for attacks. This is particularly problematic in the areas of finance, law, and health. Other studies have also been able to trace such attacks. What is new in the Stanford study is the use of a proxy model, which is cheaper and less powerful than the large reasoning models.
(emw)