Between Speed and Sustainability: AI Agents in Software Development
How AI agents accelerate processes and turn software developers into architects of complex modernization.
(Image: Ole.CNX / Shutterstock.com)
- Venko Kisev
- Marius Wichtner
Agentic Coding has become an integral part of everyday development. AI agents plan, implement, test, and document code in a short time. Teams orchestrate specialized assistants instead of writing every change by hand. Alongside AI assistants like GitHub Copilot and chat models like ChatGPT, autonomous agents are emerging that take over entire development steps.
The pace is increasing, but so is the need for guardrails. Functionally correct code is not enough if security requirements, latency targets, scalability, and architectural conventions are ignored. Without sufficient contextual knowledge, systems gradually drift away from their original architecture, tests lose their accuracy, and technical debt grows.
How can agents be integrated into existing software landscapes, and where do their limits lie with non-functional requirements? What roles and processes do they play in practice, and what checks in pipelines help to combine speed and sustainability? The present article answers questions like these.
Agentic Coding refers to development with AI agents, individually or in combination with several specialized agents, depending on the tasks in the project, the size of the codebase, and the type of change to be implemented. Unlike assistive tools, they do not just provide point-in-time support but largely autonomously implement defined goals. They plan tasks, change code, generate tests, document decisions, and perform correction loops independently. The human defines goals and rules, reviews the results, and makes architectural decisions.
From Autocompletion to Architectural Responsibility
In addition to GitHub Copilot and the chat models from Anthropic and OpenAI (ChatGPT), there is now a new class of tools: autonomous agents. They take over entire development steps from planning to integration. Today, they can be found as IDE integrations, for example in Visual Studio Code forks like Windsurf and Cursor, as terminal-integrated agents like Anthropic's Claude Code or OpenAI's Codex, and as open-source software like OpenCode.
These agents design APIs, orchestrate test suites, detect security vulnerabilities, or refactor legacy modules without changing their behavior. In some cases, they access external knowledge sources, document their decisions, and perform correction loops independently. This creates a new work model: developers no longer interact with individual tools but orchestrate a team of specialized agents. This shift is not only happening on a technical level but also in the way projects are structured and managed.
This development opens up many possibilities, but it also shifts responsibility. Because the more operational tasks agents take on, the more important the ability to set conceptual guardrails becomes. Who decides if a solution is sustainable? Who checks if the generated code fits the target architecture? And what happens when agents contradict each other, ignore non-functional requirements, or make their own assumptions that are not desired from a business perspective?
Agentic Coding is not a linear continuation of automation but a new phase: decisions are shifting, processes are accelerating, expectations are changing. And precisely for this reason, a reflective approach is needed, not only towards the tools but also towards the attitude towards development itself.
Videos by heise
Where Agents Hit Limits: Non-Functional Requirements
Agentic Coding impressively demonstrates how far AI-assisted automation has come. AI agents are increasingly taking over functional tasks reliably: they design an API, validate a form, integrate a database query, often faster than a human can. However, the results are not always technically correct; only with expert knowledge and the control of non-functional requirements do they become reliable, well-documented, executable, and deployable.
Because functional does not mean robust. Because as soon as requirements come into play that go beyond the mere "what the software should do" – such as how quickly it needs to respond, how secure or maintainable it should be – new gaps emerge. This is precisely where the domain of non-functional requirements begins, and with it, the typical weaknesses of today's agents.
Non-functional requirements describe the "how" of a system: How performant should a service react under load? How secure are interfaces protected against attacks? How scalable is a new module if the business model changes? What architectural conventions must be adhered to in order to maintain long-term maintainable systems?
Such requirements are rarely directly derivable from a prompt. They are often implicit, context-dependent, company-specific, and dynamic. Many decisions in software have causal dependencies, for example, because another team has set requirements or chosen a specific solution. Requirements often arise over time: scalability only becomes important when user numbers grow; latency requirements when real-time functions are introduced. Such interdependencies are difficult to condense into a short prompt. Agents that operate only locally (i.e., focused on individual files or tasks) have no view of these overarching aspects.
Example: Risk Instead of Productivity
For example, an agent generates a new REST API in record time, including tests and documentation. Technically correct, syntactically clean, but without integration into existing security mechanisms, without a logging concept, and without regard for latency or scalability. What looks like productivity at first glance quickly becomes a risk at the system level.
The same applies to architectural decisions. Agents work contextually, but only with the information they have available. Whether a module fits into an existing layered architecture, whether existing rules are violated, or whether cyclic dependencies arise often remains unnoticed without additional control mechanisms. Wrong decisions do not creep in; they scale immediately.
An agent can implement a method correctly but not recognize that it lies in a security-critical path. It can write tests but not evaluate whether they actually secure the business-critical logic. It can refactor code without knowing that it touches a regulatory framework.
In practice, this leads to a paradoxical effect: systems are created faster, appear complete at first glance, yet generate technical debt because key quality features are missing.
A recurring pattern also emerges in terms of test coverage: while many tests are created, not necessarily the right ones. Typically, trivial cases are covered, while edge cases, error paths, or business-critical logic remain unaddressed. The illusion of test coverage does not replace a robust quality strategy.
In addition, new challenges arise at the meta-level: Who controls the output? What role does human review play when agents generate dozens of commits per hour? What metrics help in evaluation? And how can technical debt persist and automatically grow?