Linux 6.13 with new multitasking

Linux 6.13 improves multitasking and brings critical fixes. The new kernel comes with fine-grained time stamps and secure virtualization on ARM64.

listen Print view
Penguin sits in front of a computer that displays a penguin and the word "Linux"

(Image: Bild erstellt mit KI in Bing Designer durch heise online / dmk)

10 min. read
By
  • Oliver MĂĽller
Contents

The new Linux kernel 6.13 appeared on time after seven release candidates at night from Sunday to Monday. The inclusion of critical patches, which are also to be incorporated into the predecessor kernels, was exciting.

A major construction site is the renewed tweaking of the scheduler. The multitasking modes are shifting and shrinking from four to three. In the PowerPC area, it's time to say goodbye.

Every modern kernel of an operating system – such as Linux – relies on “preemptive multitasking” nowadays. “Preemptive” means something like “obliging” or “interrupting” (preemption). To create the illusion of programs running in parallel, the processes (or threads) are given time slices. In these time slices, they run on the processor or one of its cores. After the time slot has elapsed, the kernel saves the state of the current process, retrieves the saved state of another process and starts it in its share of computing time. This process is the context switch.

Previously, the Linux kernel had four selectable modes for preemption. PREEMPT_NONE is the simplest form. It only allows switching to the next process and its time slice when the time slot of the other process has expired. PREMPT_VOLUNTARY can interrupt the running process at a number of predefined points in the kernel, even if its time has not yet expired.

Videos by heise

PREEMPT_FULL goes one step further. It allows a process to be interrupted at any time. The only exceptions are places where the kernel explicitly does not allow it. This is the case, for example, if the process or thread holds a spinlock. A spinlock is a lock to prevent competing access to resources by several processes. The spinlock can only ever be held by one process in mutual exclusion. The others have to wait until it releases the spinlock and thus frees the resource.

Finally, PREEMPT_RT binds the process change to many other criteria in real-time operation. With PREEMPT_RT, it is also possible to interrupt a process if it is holding a spinlock.

The responsiveness of the system increases from PREEMPT_NONE to PREEMPT_RT. The system can react more quickly to events. Such an event can be a keystroke on the keyboard, a mouse movement, or an interrupt. This fast response comes at the cost of a possible “chopping up” of processes, especially long-running ones.

Tasks with an intensive processor load like to be able to run undisturbed for a long time. If these are chopped up, the throughput of the system decreases. In addition, – depending on the preempt mode –, new blocked resources can occur each time the system is interrupted. Put simply, the process has to re-sort itself each time the process is interrupted. Short runners, on the other hand, are less bothered by interruptions. Short runners offer fewer opportunities for “chopping” the time slice.

There is therefore no universally suitable model for preemptive multitasking. Depending on the workload, i.e., the time and resource requirements of a task, one or other mode in the kernel is more suitable. This is why most distributions deliver the kernel with the pseudo mode PREEMPT_DYNAMIC. This is nothing more than the option to switch between all modes at boot time – except for PREEMPT_RT. The default setting is PREEMPT_VOLUNTARY.

If you want to see which mode is currently active on your Linux, you can read the – from /sys/kernel/debug/sched/preempt, depending on the kernel and configuration –. A simple cat on the pseudo file returns the mode. If PREEMPT_DYNAMIC is active, the output is “none (voluntary) full”, with the current mode highlighted in brackets. The boot mode can then also be set with the preempt parameter in the kernel command line (cmdline). For example, for PREEMPT_FULL with preempt=full.

Depending on the preempt mode, there are many places in the kernel where a context switch could take place before the time slice expires. There, the kernel uses many variables to check whether the current process should be interrupted. This chops up the process flow each time and generates overhead. This is not very efficient.

The new kernel adds a new mode. “Lazy Preempt” (PREEMPT_LAZY) delays the context switch if a process could be interrupted, but this is not urgently required. Instead of changing the context immediately, the change is postponed until a favorable point in time is reached. Such a point in time could be the end of a critical section or the next switch from user space back to kernel space. Only at critical moments does the kernel actually interrupt the process immediately.

This new mode is a mixture of PREEMPT_NONE and PREEMPT_YOLUNTARY and replaces both modes. PREEMPT_FULL is retained, as is PREEMPT_RT.

Also concerning the scheduler is a last-minute critical patch for the EEVDF scheduler. There, the entity replacement bug caused delays in scheduling in some situations.

Linus Torvalds included this critical patch in the new kernel on Sunday. This was done at the last minute. In addition, this patch will also be backported into last year's kernels (backport) and thus also included in these.

Timestamps are so important for some applications for synchronization and for tracking changes that the normal resolution is not sufficient. One such case is NFSv3, which uses the timestamp to recognize whether a cache is still valid. If the time is quantized with too large intervals, it can happen that a client works with an old version of a file. Inconsistencies are inevitable.

On the other hand, time stamps that are too fine-grained mean that the metadata in the file system has to be updated much more frequently. If, for example, only times are recorded every second, the system rewrites the metadata every second at most. If times are recorded at the millisecond level, the system must update the metadata thousands of times per second in extreme cases.

Linux 6.13 intelligently introduces these fine-grained times. The normal time stamps in milliseconds are retained and are also used throughout. The system only uses and updates the finer timestamps if an application explicitly requests them. In this way, the kernel manages the balancing act between performance – not “writing itself to death” with timestamps – and the requirement of special time-penile applications.

Linux now supports atomic writes for XFS, Ext4 Direct I/O and some soft RAID modes. These are necessary if data is to be written that is larger than the sector size of the data memory specified on the hardware side. This means that several sectors in a write operation can be regarded as a logically coherent area. With “atomic write”, either all sectors are written in one write operation or none.

On ARM64, Linux guests can run in secure virtual machines (VM). Linux implements the “ARM Confidential Compute Architecture” (CCA) for this purpose. This keeps the VM's memory area hidden from the hypervisor's eyes.

Linux 6.13 also supports “Guarded Control Stack” in user space. This is the shadow stack variant from ARM.

To free the kernel from ballast, Linux 6.13 also cuts out obsolete features. This time, the PowerPC 970FX is affected. Many people still know this processor in “Apple-speak” as the PowerPC G5. But we can breathe a sigh of relief, the old Macs with G5 will continue to be supported.

The evaluation boards “Maple” with 970FX are affected by the removal from the kernel. However, according to the commit message, IBM JS20/JS21 blades are also affected. These PowerPC blades have no future under Linux in the medium to long term. The same applies to the YDL Powerstations, which “Yellow Dog Linux” breathed life into at the time.

However, the supposedly closed door to Linux-land remains a little crack wide open. According to the commit message, the changes can be reversed should an outcry from affected users reach the kernel team.

The lazy mode for preemptive multitasking in the scheduler promises more efficiency and a better balance between responsiveness and process interruptions. The last-minute patch in the scheduler is critical and should also be included in older kernels. This includes the current kernel 6.12 with long-term support, on which some distributions will be based. Unfortunately, due to time pressure, this patch will only be tested in the wild.

The fine granular timestamps have finally made it into the kernel. After all, they were already planned for Linux 6.6, but were removed in the third release candidate 6.6-rc3. The atomic write operations had also been expected for some time, and 6.13 consistently delivers here.

Linux 6.13 not only brings new drivers and bug fixes. It is a solid further development and not just a maintenance release. Let's hope that last-minute patches don't spoil the soup.

As always, the new kernel is available for download in source code at kernel.org. The detailed changelog provides information about all the changes to the new Linux heart.

(olb)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.