Linux 6.12: Scheduler now expandable and EEVDF conversion complete

After much debate, the “extensible scheduler” finally made it into the official kernel. Deadline server for real-time environments is another new function.

Save to Pocket listen Print view
Penguin sits in front of a computer that displays a penguin and the word "Linux"

(Image: Bild erstellt mit KI in Bing Designer durch heise online / dmk)

8 min. read
By
  • Thorsten Leemhuis
Contents

Linux 6.12, expected on November 18 or 25, brings three major changes to the code that controls when and how long processes use the processor. The most hotly anticipated is the Extensible Scheduler Class, known as Sched_Ext, which allows the process scheduler to delegate many processor time allocation decisions to BPF programs. Knowledgeable users can write these themselves and load them into the kernel in order to adapt the time distribution to their needs without having to change the Linux source code.

The kernel developers have also completed the modification of the time distribution algorithm started in Linux 6.6, which allows the process scheduler to use the "Earliest Eligible Virtual Deadline First" (EEVDF) method. This fine-tuning makes it possible to reduce the latency of mostly short-running applications.

The third new feature is the "SCHED_DEADLINE Server Infrastructure": In systems with real-time processes, it is intended to better ensure that low-priority applications continue to be given sufficient priority. This function is the last major change made by a central and respected developer who passed away in June at the age of 37.

The Extensible Scheduler Class was largely driven by Meta developers: The company itself already uses Sched_Ext to better adapt the allocation of processor time to the needs in their huge data centers. To this end, developers write BPF programs for the workloads of the various server classes; these are then loaded into the kernel at runtime, which executes them with the BPF virtual machine in the kernel context. Such BPF programs run less shielded than regular application programs, but are subject to some security restrictions –. Ultimately, however, they can interact much faster with the kernel and access data processed by it directly. BPF programs are already widely used in the Linux environment, for example in Systemd security mechanisms, for performance analysis or for high-performance control of network data streams.

Meanwhile, some developers and companies are already working on Sched_Ext programs to optimize the allocation of processor time for larger user groups – for gamers, for example, in order to avoid stutters in resource-demanding games. It is to be expected that some distributions will include such BPF programs in the future and activate them temporarily when games are started. They will have to disclose the source code, as Sched_Ext programs, like the kernel code, must be subject to the GPLv2 or a compatible license.

It is likely that all kinds of Sched_Ext programs will soon be in circulation. As is usual with such interfaces for extensions, some of them are likely to circumvent problems and functional gaps in the process scheduler that would probably be better eliminated in the scheduler's C code. In the worst case, this can cause problems for users –, for example, if they want to use a Sched_Ext extension for gaming that may not work at all or only poorly with another one that optimizes the performance of processors with CPU cores of different speeds.

Because of these and numerous other aspects, several developers of the Linux processor scheduler have spoken out against the inclusion of Sched_Ext – often very clearly. At the same time, the other side has argued, among other things, that Sched_Ext makes it easier to experiment with new scheduler procedures and could thus perhaps promote the further development of the regular scheduler. Linus Torvalds' attitude was uncertain for a while. For many years, he was an advocate of the approach "the kernel should only have one process scheduler that covers all areas of application" –, which is why the alternative "Brain Fuck Scheduler" (BFS) from developers such as Con Kolivas and other schedulers were left out for years, even though they are extremely popular in certain circles.

However, about a year ago at the annual Kernel Maintainer Summit, Torvalds spoke out clearly in favor of the inclusion of Sched_Ext. However, this did not happen for the time being. A few months ago, he then indicated that he would bypass the developers of the regular process scheduler and integrate Sched_Ext into Linux 6.11 despite their criticism if no agreement was in sight. As a result, both sides tweaked some details again so that everyone could at least come to terms with the whole thing a little better; in the end, it became 6.12 instead of 6.11.

The LWN.net articles"The extensible scheduler class" and " Another push for sched_ext" provide some background information on the whole controversy and Sched_Ext in general. Further insights can be found in the letter accompanying the Sched_Ext patches, the description in the merge commit and the technical documentation for Sched_Ext.

Coincidentally, at the same time, developers of the regular scheduler have completed and refined the conversion to computing time distribution started in Linux 6.6 with the "Earliest Eligible Virtual Deadline First" (EEVDF) method. Among other things, this brings advantages for applications that should react quickly but usually only run for a short time.

The kernel can now run such processes more frequently – and, if necessary, it can also take the CPU away from applications, even if they have not yet exhausted their currently used time slice. Previously, the kernel only did this if it wanted to take processes with real-time priority. However, the kernel does not choose this path on its own, but only for processes that explicitly request shorter time slices via sched_setattr() and sched_attr::sched_runtime. This means that they are ultimately executed more frequently, which reduces latency – but at the same time they are also shorter, because in the end they receive just as much CPU time as other processes with the same priority in order to avoid unfairness.

The documentation on this implementing change explains the whole thing in more detail using clever ASCII art; LWN.net provides further details on the conversions and the time distribution with shorter time slices in the text"Completing the EEVDF scheduler". The merge commentary of the major changes to the scheduler also roughly outlines these and other improvements to EEVDF.

The merge commentary also mentions the third major change: the SCHED_DEADLINE server infrastructure. It is intended for systems that run real-time applications and regulate their time allocation using the Deadline Scheduling Class of the regular scheduler. In the past, real-time applications could monopolize the processor to a large extent, so that processes with regular priorities were not given sufficient time. An approach known as "real-time throttling" was intended to prevent this, but it often worked rather poorly. The new infrastructure takes a different approach and ensures that regular processes receive at least five percent of the processor time in the standard configuration. The LWN.net article"Deadline servers as a realtime throttling replacement" provides further insights into the process.

The driving force behind this new infrastructure was Daniel Bristot de Oliveira. It is his last significant contribution to Linux, as he passed away in June at the age of 37. Bristot was a highly dedicated and respected developer in the Linux realm for many years.

Portraitfoto des verstorbenen Daniel Bristot de Oliveira.

(Image: bristot.me / Daniel Bristot de Oliveira)

Several dozen developers remembered him at a conference last week in a "Celebration of Life". Less than two hours later and just a few meters away, Linus Torvalds was presented with the pull request that , after 20 years of painstaking work, Linux now comes with real-time capabilities out of the box – an achievement that is also largely thanks to Bristot.

(dahe)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.