Humans do it better: study shows the limits of coding agents

Can artificial intelligence now compete with developers? Not yet, says a new study and shows what could be done better.

listen Print view
Robot,Gives,A,Hand,To,A,Woman.,Two,Hands,In

(Image: Willyam Bradberry/Shutterstock.com)

3 min. read
By
  • Manuel Masiero
Contents

Current AI agents promise to act completely autonomously in software development. But are they ready to compete with developers in terms of expertise ?

No, says a study entitled "Challenges and Paths Towards AI for Software Engineering", in which researchers from Cornell University, the Computer Science and Artificial Intelligence Laboratory (CSAIL) at the Massachusetts Institute of Technology, Stanford University and UC Berkeley were involved. Current LLM models have not yet reached the point where it is possible to work with them in the same way as with a flesh-and-blood colleague.

Many AI tools have now become so powerful that they offer developers real added value. However, according to the study, complex coding tasks can prove to be a stumbling block for AI colleagues. These include aspects such as understanding the context for very extensive code bases, higher levels of logical complexity and the ability to plan and implement code structures in such a way that their quality remains at the same level in the long term.

An example of a complex coding task is fixing a memory security error. Effective bug fixing requires developers not only to locate the error in the code, but also to understand its semantics and functionality. Sometimes unexpected additional work is required. For example, a memory bug could make it necessary to change the entire memory management.

If you entrust an AI tool with the same complex task, it could do as good a job as a developer. However, this is not guaranteed. It could just as easily happen that the AI hallucinates about the error or its cause, makes irrelevant suggestions for improvement or wants to make disproportionately large code corrections.

Videos by heise

Programming tasks are best solved through more effective communication between humans and machines. Software development is about finding a common vocabulary and a common understanding of a problem. This also applies to the way in which the problem is then solved on the code side.

AI still finds it difficult to capture or reproduce the architecture of a system in all its facets. This is also due to the current AI interfaces, which are still quite limited compared to the possibilities that humans have to communicate with each other.

According to the study, the communication barriers between humans and machines could be broken down if AI systems learn to proactively ask for additional information in the event of vague instructions or unclear scenarios. This would also make it possible to understand code context that developers have in mind but which is difficult to convey via conventional AI agents.

Such direct indications can not only avoid uncertainties in AI systems, but also enable them to better understand the intentions of developers. They could be implemented by advanced AI agents such as AlphaEvolve from Google DeepMind, which independently designs and evaluates algorithms.

(dmk)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.