AI assistants help developers to improve the quality of their code

According to a GitHub study, code written with AI support passes more unit tests and reviewers consider it to be of higher quality.

(Image: erstellt mit Dall-E durch iX)

Nov 21, 2024 at 8:53 am CET

3 min. read

Developer

By

Wolf Hosbach

Copilot, the AI coding assistant from GitHub, has been around for two years, which prompted the manufacturer to conduct an extensive study on the topic of code quality. It shows that code created with AI help passes more unit tests and has fewer errors. Reviewers also rated it as more readable and reliable.

GitHub gave 202 Python developers with at least five years of professional experience the task of writing an API endpoint for a web server, specifically a rating system for restaurants. 104 of them were allowed to use GitHub Copilot, 98 were not allowed to use any AI assistant at all. The testers ran all examples through ten unit tests to check the correct functionality, and the result was clearly in favor of AI help: 60.8 percent of all programs passed all ten tests with Copilot, only 39.2 percent without AI help.

Infographic Unit Tests — Programs written with the help of AI are significantly more likely to pass all ten unit tests.

(Image: GitHub)

25 selected developers whose code had passed all ten tests were then asked to carry out blind, anonymized reviews of the programs, whereby each program was checked a total of ten times. In this step, errors no longer meant functional defects, but rather qualitative defects relating to consistency or readability: inconsistent naming, unclear identifiers, lines that were too long, loops that were too nested, missing comments, repeated expressions (don't repeat yourself, DRY) or unclean division of functions.

Videos by heise

Here too, the AI performs well, but not quite as clearly. On average, the reviewers found 4.63 errors in programs with AI help and 5.35 errors in those without. The number of lines of code per error was also higher without AI help: 18.2 lines compared to 16 lines. This refutes older studies that feared bloated code and a violation of the DRY principle in particular. Other recent studies support the assumption that AI improves quality.

Infographic incorrect lines of code — AI-supported programs also perform better in terms of the number of faulty lines of code.

(Image: GitHub)

In addition, the reviewers were asked to make softer statements about how readable, reliable, maintainable and concise the code was. This results in an advantage of between 3 and 5 percent for the co-pilot, although the study does not make it entirely clear how the values were arrived at.

The study also took a look at the commits of the test subjects, which were more frequent and smaller with the help of AI.

In conclusion, the study suggests: "Our hypothesis is that because developers need less time to make the code functional, it allows them to focus more on refining the quality."

(who)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.