AI code increasingly leads to production failures

The rapid introduction of AI-generated code is increasingly leading to production failures, according to a survey.

The AI in code

(Image: LuckyStep/Shutterstock.com)

at 1:21 pm CEST

2 min. read

By

Daniel Herbig

In a study by the software company CloudBees, more than 200 technology executives were surveyed about the use of AI in their companies. 81 percent reported problems such as functional errors, security vulnerabilities, and performance issues after deployment that are related to AI-generated code. 63 percent additionally reported compliance violations caused by the AI. These also sometimes made their way into productive business.

One issue appears to be, according to CloudBees' survey results, that testers can no longer keep up with validating AI code. 62 percent increased automated tests, 30 percent added more manual verification steps. However, only half believe that the formal review processes for AI code are truly always applied in their company. For many, managing the test environment has become a greater burden than writing the code itself.

Reports are piling up

This aligns with increased reports of issues with AI code. Amazon has already experienced repeated issues attributed to code from AI assistants. The study “Coding on Copilot” by GitClear also suggests that the increasing use of AI programming aids could impair code quality. The Fraunhofer Institute for Experimental Software Engineering also emphasizes the importance of new, thorough control mechanisms for AI code according to current studies. Otherwise, errors would be imminent.

Videos by heise

A central conflict between AI agents and software stability lies in the lack of determinism of generative AI models. Classic software engineering is based on systems that deliver identical results for identical inputs. Generative AI, on the other hand, works with probabilities and can produce different variants of the same code even with consistent logic. This stochastic behavior leads to problems, particularly where hundred percent accuracy is a central criterion; for example, in security-critical development environments.