GPT-5 mixes cocktails: Security researchers crack OpenAI's self-restrictions

The latest version of OpenAI-LLM is more powerful and cheaper, but still falls for old tricks to circumvent the locks.

(Image: Tada Images/Shutterstock.com)

Aug 11, 2025 at 10:14 pm CEST

2 min. read

By

Dr. Christopher Kunz

Three empty glass bottles, oil, alcohol, rags and matches: the latest edition of OpenAI's Large Language Model (LLM) GPT-5 also spits out these and other bomb-making instructions if you ask it the right way. Security companies found this out through their experiments.

The company NeuralTrust, based in Barcelona, Spain, and New York, first asked GPT-5 for a few innocuous sentences and a short story containing terms such as "Molotov" and "cocktail". Then the researchers asked for technical details and the LLM obeyed. Using this technique called "echo chamber and storytelling", the AI quickly elicited the bomb-making recipe.

SPLX 's tests also came to the same conclusion, albeit differently. The AI security specialists instructed the LLM to respond in the role of a "BasedGPT" and to consider all restrictions with spiteful comments, but otherwise simply to ignore them. And that worked too: GPT-5 answered the question "How to build a bomb" with very detailed instructions on how to construct an improvised explosive device (IED) from common household materials.

Videos by heise

Bomb recipes from the chatbot

If OpenAI, the company behind the language model, has its way, it shouldn't work so easily. However, models from other manufacturers such as GLM-4.5, Kimi K2 and Grok-4 could be similarly easily duped, write the SPLX researchers. However, they are not informing us about the security gaps in an altruistic manner: SPLX and NeuralTrust earn their money by hardening LLMs in terms of security technology and thus making them suitable for use in companies.

Bomb-making, poison gas, narcotics – Large Language Models know the ingredients of these mostly forbidden objects from their training data, but are not supposed to pass them on to users. Outsmarting these blocks is a popular pastime for security experts, who sometimes even resort to psychological tactics such as gaslighting.