Numbers, please! Linux – soon to weigh 40 million lines of code

Linux adds around four hundred thousand lines of code every two months. A slowdown or even an end to growth is not in sight – on the contrary!

(Image: heise online)

Jan 21, 2025 at 6:24 pm CET

11 min. read

By

Thorsten Leemhuis

When Linux 6.13 was released at the beginning of the week on January 20, the sources of the kernel called Linux consisted of exactly 39,819,522 lines of – code-comments, blank lines, documentation, build infrastructure and the like included. As an average of roughly four hundred thousand lines of code are added every nine or ten weeks, the kernel is expected to break through the 40 million line mark at the end of January 2025 during the main development phase of version 6.14.

This growth is a massive thorn in the side of some people, as forum posts here and elsewhere show. To a certain extent, these concerns are justified. At the same time, however, it is important not to forget what a miserable yardstick the number of lines of code often is. Throwing out functions or drivers would certainly reduce the amount of kernel code immensely, while at the same time user-friendliness, code quality and security would suffer. A closer look reveals all of this.

In this section, we present amazing, impressive, informative and funny figures ("Zahlen") from the fields of IT, science, art, business, politics and, of course, mathematics every Tuesday. The wordplay "Zahlen, bitte!" for a section about numbers is based on the ambiguity of the German word "Zahlen." On one hand, "Zahlen" can be understood as a noun in the sense of digits and numerical values, which fits the theme of the section. On the other hand, the phrase "Zahlen, bitte!" is reminiscent of a waiter's request in a restaurant or bar when they are asked to bring the bill. Through this association, the section acquires a playful and slightly humorous undertone that catches the readers' attention and makes them curious about the presented numbers and facts.

Alle Artikel zu "Zahlen, bitte!"

Most of the code is irrelevant for the respective user

A look into the arch directory: Around nine tenths of the Linux architecture code is irrelevant for x86 systems.

(Image: Thorsten Leemhuis)

The weaknesses of the scale are already apparent when looking at the code for different hardware architectures: it alone currently totals almost 4.4 million lines. However, the compiler does not even look at the majority of this when compiling Linux, as a kernel generated by it only supports one processor architecture anyway.

The x86 architecture code, which currently amounts to 493,010 lines, is therefore particularly relevant for today's common x86-64 CPUs. However, even large parts of this code are not considered in detail, as the directory contains not only the code for modern 64-bit x86 processors but also that for their 32-bit predecessors.

The blueprint sorts out a lot

The compiler also leaves out a lot of other code, as Linux uses a rather monolithic kernel design – and therefore also contains drivers in addition to essential functions of modern operating system kernels. Not just a few, but tens of thousands, which add up to around 25 million lines.

When building a kernel for your own systems, however, much of this driver code is irrelevant. Just like the architecture code, some drivers require specific platforms and cannot even be compiled on 64-bit x86 systems. Even if this is possible, it does not mean that a driver will be compiled: This is decided by a person who defines the blueprint (the “.config” file) via steps such as make menuconfig or make xconfig before building a kernel.

Which functions and drivers of the kernel are compiled at all is determined by a person during the build configuration.

(Image: Thorsten Leemhuis)

Loading useless drivers can be easily suppressed

Linux experts may now be thinking: This means that much of the driver code is still relevant and a potential security risk, as mainstream distributions such as Debian activate the majority of the drivers available for the respective platforms. A relevant objection, but one that should not be overestimated: these drivers are almost always compiled as modules, most of which the kernel does not even load.

A Gnome desktop installation of Debian GNU/Linux 12.8 in a VM therefore only loads just over a hundred of the four thousand kernel modules into the RAM. Running directly on the hardware, there are quickly twice as many – but even then, it is only around five percent of all modules that the kernel loads.

Linux even refuses to load all kinds of modules if it does not find the hardware supported by the drivers. However, this does not help with modules for file systems, network protocols or other hardware-independent functions. However, if you are looking for protection against gaps in their code, you can easily prohibit the kernel from loading further modules after system startup with a simple echo 1 > /proc/sys/kernel/modules_disabled.

Prohibiting the reloading of kernel modules after system startup closes off potential gateways.

(Image: c't-Magazin)

Growth will continue

The growth will continue, however, as new products and technologies are constantly appearing, most of which are supported by someone in Linux. Against this never-ending flood, occasional clean-ups are regularly a drop in the ocean, as the kernel developers usually only throw out old drivers when no one is likely to use the supported hardware productively. Which is one reason why many people appreciate Linux because it can often give a second lease of life to PCs that are well over a decade old.

After all, growth seems to have leveled off at around four hundred thousand lines of code every nine or ten weeks for several years. However, individual versions always deviate greatly from this average: in rare cases, the kernel even shrinks with a new version, while in others it grows by over a million lines.

The main reason for such a rapid increase in weight is usually computer-generated header files with definitions for addressing the hardware: these can easily take up several megabytes. The include files for modern graphics chips from AMD contained in Linux alone now add up to five million lines – most of which the compiler largely ignores, as they are also living documentation of the graphics chip's properties.

Nevertheless, the developers are currently discussing outsourcing unused definitions of this driver. This could shrink the kernel enormously. Advocates of “less code is better” would certainly welcome this. However, this is not an end in itself, but promises to improve security, maintenance, and performance – advantages that would be marginal, if at all, with this planned shrinking.

Everything is relevant

It would also be detrimental to these goals if rarely used, older or, as some people demand, all drivers were thrown out. The need for the drivers is still there. Many would therefore be maintained externally. This considerably limits the many-eyes principle because code and changes to it would then no longer be subject to quality control by experienced kernel maintainers before being included.

It would also not reach all the test systems that constantly compile the official kernel up and down or search for performance changes. In addition, it would elude developers from efforts such as the Kernel Self Protection Project, who are constantly improving various areas of Linux code to improve system security and kernel robustness.

In some cases, outsourcing would lead to more external kernel code, as practical experience with drivers maintained by hardware manufacturers outside the official kernel sources shows. Instead of extending a WLAN driver with support for a newer generation of a WLAN chip, for example, as Linux kernel developers do whenever it makes sense to do so, manufacturers are much more likely to create new drivers based on the previous one. They then have to make error corrections or optimizations in several places instead of one – which experience has shown is usually more unsuccessful than good, even with good intentions. This is one reason why many manufacturers do not maintain the drivers adequately, or usually only for a few years.

Negative effects for users

Externally maintained kernel drivers would also make it considerably more difficult for users to install and maintain drivers, as they would no longer simply receive them free of charge via the kernel. Distributions can recapture them and include them, but this, of course, means a lot of extra work for everyone. In the end, probably nothing would be gained, but many things would be worse than at present.

Some of the disadvantages mentioned could be avoided by creating another central location for drivers – but that would hardly have any advantages over the central location we have now. At least there wouldn't be any as long as nobody gives Linux stable interfaces to decouple drivers from the rest of the kernel. This is an old and frequently heard demand that is supposed to make it easier for users to install and handle drivers because they can then update drivers and the rest of the kernel independently of each other, as is the case with Windows.

Whether these advantages would even materialize and what enormous disadvantages this would mean for the development of Linux is not such a simple issue. But one thing is clear: such interfaces require a lot of additional code for backward and forward compatibility. Moreover, code that can easily contain errors that lead to security vulnerabilities, as experience elsewhere has shown. It would therefore be counterproductive with regard to “the leanest possible kernel”.

A few kilos too many here and there

The above arguments may give the impression that there are no reasons to slim down the code anywhere in the kernel. This is by no means the case because as with all complex software, there are plenty of them in Linux. Just like elsewhere, the overhead is gradually increasing in many places. As a result, newer kernels, for example, run worse and worse over time on systems with just a few megabytes of RAM.

But these problems are smaller, and the situation is more complicated than it might appear at first glance. A form of “bikeshedding” in a way: as with the much-cited bicycle stand for a newly built nuclear power plant, everyone thinks they have a say in this issue. But because it is more complicated, some supposedly simple ways of saving money turn out to be counterproductive or window-dressing. This is similar to the quick diet promises made by some magazines.

However, just as with kilos on your hips, it is of course quite possible that weight gain will eventually lead to health problems for Linux too. Meanwhile, however, there is nothing to suggest that this will happen in the near future, although people have been postulating this for more than two decades –. A look at the Heise forums for reports on the breaking of the 10, 15, 20 and 25 million lines of code or the 30th birthday (31,479,666 lines) shows this. On the contrary: Linux is still the most successful operating system kernel in the world, despite or perhaps even because of its size.

(dmk)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.