Between Speed and Sustainability: AI Agents in Software Development

How AI agents accelerate processes and turn software developers into architects of complex modernization.

listen Print view
Two fingers from two different hands are touching. The left hand is human, the right hand is artificial.

(Image: Ole.CNX / Shutterstock.com)

14 min. read
By
  • Venko Kisev
  • Marius Wichtner
Contents

Agentic Coding has become an integral part of everyday development. AI agents plan, implement, test, and document code in a short time. Teams orchestrate specialized assistants instead of writing every change by hand. Alongside AI assistants like GitHub Copilot and chat models like ChatGPT, autonomous agents are emerging that take over entire development steps.

Marius Wichtner
Portrait Marius Wichtner

Marius Wichtner is a software engineer at MaibornWolff and is intensively involved in software architecture, clean code, and the use of generative AI in the development process. In various customer projects, he develops strategies for integrating agents into existing systems in a meaningful way, with a particular focus on sustainable code quality and developer experience.

The pace is increasing, but so is the need for guardrails. Functionally correct code is not enough if security requirements, latency targets, scalability, and architectural conventions are ignored. Without sufficient contextual knowledge, systems gradually drift away from their original architecture, tests lose their accuracy, and technical debt grows.

Venko Kisev
Portrait Venko Kisev

Venko Kisev heads up the Software Health Check and Modernization department at MaibornWolff. Over the past 15 years, he has led numerous analysis and modernization projects with his team and worked with leading companies from various industries – increasingly focusing on the meaningful and responsible use of AI.

How can agents be integrated into existing software landscapes, and where do their limits lie with non-functional requirements? What roles and processes do they play in practice, and what checks in pipelines help to combine speed and sustainability? The present article answers questions like these.

Briefly explained: Agentic Coding

Agentic Coding refers to development with AI agents, individually or in combination with several specialized agents, depending on the tasks in the project, the size of the codebase, and the type of change to be implemented. Unlike assistive tools, they do not just provide point-in-time support but largely autonomously implement defined goals. They plan tasks, change code, generate tests, document decisions, and perform correction loops independently. The human defines goals and rules, reviews the results, and makes architectural decisions.

In addition to GitHub Copilot and the chat models from Anthropic and OpenAI (ChatGPT), there is now a new class of tools: autonomous agents. They take over entire development steps from planning to integration. Today, they can be found as IDE integrations, for example in Visual Studio Code forks like Windsurf and Cursor, as terminal-integrated agents like Anthropic's Claude Code or OpenAI's Codex, and as open-source software like OpenCode.

These agents design APIs, orchestrate test suites, detect security vulnerabilities, or refactor legacy modules without changing their behavior. In some cases, they access external knowledge sources, document their decisions, and perform correction loops independently. This creates a new work model: developers no longer interact with individual tools but orchestrate a team of specialized agents. This shift is not only happening on a technical level but also in the way projects are structured and managed.

This development opens up many possibilities, but it also shifts responsibility. Because the more operational tasks agents take on, the more important the ability to set conceptual guardrails becomes. Who decides if a solution is sustainable? Who checks if the generated code fits the target architecture? And what happens when agents contradict each other, ignore non-functional requirements, or make their own assumptions that are not desired from a business perspective?

Agentic Coding is not a linear continuation of automation but a new phase: decisions are shifting, processes are accelerating, expectations are changing. And precisely for this reason, a reflective approach is needed, not only towards the tools but also towards the attitude towards development itself.

Videos by heise

Agentic Coding impressively demonstrates how far AI-assisted automation has come. AI agents are increasingly taking over functional tasks reliably: they design an API, validate a form, integrate a database query, often faster than a human can. However, the results are not always technically correct; only with expert knowledge and the control of non-functional requirements do they become reliable, well-documented, executable, and deployable.

Because functional does not mean robust. Because as soon as requirements come into play that go beyond the mere "what the software should do" – such as how quickly it needs to respond, how secure or maintainable it should be – new gaps emerge. This is precisely where the domain of non-functional requirements begins, and with it, the typical weaknesses of today's agents.

Non-functional requirements describe the "how" of a system: How performant should a service react under load? How secure are interfaces protected against attacks? How scalable is a new module if the business model changes? What architectural conventions must be adhered to in order to maintain long-term maintainable systems?

Such requirements are rarely directly derivable from a prompt. They are often implicit, context-dependent, company-specific, and dynamic. Many decisions in software have causal dependencies, for example, because another team has set requirements or chosen a specific solution. Requirements often arise over time: scalability only becomes important when user numbers grow; latency requirements when real-time functions are introduced. Such interdependencies are difficult to condense into a short prompt. Agents that operate only locally (i.e., focused on individual files or tasks) have no view of these overarching aspects.

For example, an agent generates a new REST API in record time, including tests and documentation. Technically correct, syntactically clean, but without integration into existing security mechanisms, without a logging concept, and without regard for latency or scalability. What looks like productivity at first glance quickly becomes a risk at the system level.

The same applies to architectural decisions. Agents work contextually, but only with the information they have available. Whether a module fits into an existing layered architecture, whether existing rules are violated, or whether cyclic dependencies arise often remains unnoticed without additional control mechanisms. Wrong decisions do not creep in; they scale immediately.

An agent can implement a method correctly but not recognize that it lies in a security-critical path. It can write tests but not evaluate whether they actually secure the business-critical logic. It can refactor code without knowing that it touches a regulatory framework.

In practice, this leads to a paradoxical effect: systems are created faster, appear complete at first glance, yet generate technical debt because key quality features are missing.

A recurring pattern also emerges in terms of test coverage: while many tests are created, not necessarily the right ones. Typically, trivial cases are covered, while edge cases, error paths, or business-critical logic remain unaddressed. The illusion of test coverage does not replace a robust quality strategy.

In addition, new challenges arise at the meta-level: Who controls the output? What role does human review play when agents generate dozens of commits per hour? What metrics help in evaluation? And how can technical debt persist and automatically grow?

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.