DeepSeek-R1 Generates Insecure Code with Politically Sensitive Terms

The Chinese AI DeepSeek-R1 generates worse code when terms like Falun Gong or Taiwan are present in the prompt. Security researchers have discovered this.

(Image: Runrun2/Shutterstock.com)

Nov 20, 2025 at 9:56 pm CET

2 min. read

By

Malte Kirchner

The Chinese AI model DeepSeek-R1 reacts in an unusual "allergic" way when prompts contain terms considered sensitive by the Chinese government. This has now been discovered by security researchers from CrowdStrike. In such cases, the Large Language Model (LLM) outputs insecure code for programming projects. If no such terms are present in the prompt, the results are significantly better, as the researchers demonstrated in tests.

Sensitive terms include politically sensitive phrases such as "Uyghurs", "Falun Gong", and "Taiwan". In the case of the political movement "Falun Gong", the LLM even completely refuses to generate code in 45 percent of cases, CrowdStrike writes in a blog post. The researchers suspect that DeepSeek has integrated some kind of kill switch. In the reasoning model, they observed that the AI prepares a detailed answer but then suddenly aborts with an error message.

Unintended Side Effect of Training?

Regarding the poor quality of the output code, the researchers have a different hypothesis. They assume that the model unintentionally learned during training that negatively charged terms must also lead to poor results. The reason DeepSeek trains its model accordingly is that Chinese regulations require AI services to adhere to "socialist core values".

Videos by heise

Among the insecure code generations were, for example, hardcoded passwords in scripts, making them vulnerable, or data transfers occurring in an unsafe manner. At the same time, however, the model claimed to be applying PayPal's procedures and thus generating secure code. In one example, DeepSeek-R1 generated a complete web app but omitted session management and authentication. In other examples, passwords were stored using insecure hashing methods or in plain text.

Companies Should Verify Security

In the study, 6050 prompts were applied per LLM. Each task was repeated five times to determine if the observations were reproducible. CrowdStrike recommends that companies using LLMs for programming should systematically test them for security – especially under real-world operating conditions. It is not enough to simply rely on benchmark figures provided by the developers.