Race against Rust: GNU Coreutils get faster
The Rust implementation of Coreutils recently recorded a noticeable performance increase. Now the GNU developers are following up.
(Image: heise medien)
With version 9.11, the developers of GNU Coreutils have reported higher performance of their applications across various processor architectures. Some of the elementary system programs for Linux and Unix achieve 15 times the speed compared to the previous version. In addition to some bug fixes and improvements, the developers have revised the compatibility of cat with Unix implementations.
Fewer copy operations, more speed
The highest performance gain is achieved by yes, whose data throughput on Power10 systems increases from 11.6 GByte/s to 175 GByte/s. To achieve this, the developers use a zero-copy I/O implementation under Linux. Instead of copying data back and forth between kernel and user space, the new variant uses modern kernel functions such as sendfile() to avoid copy operations. In this way, the performance of cat increases sixfold on Power10 processors and fivefold on AMD64 processors.
In addition to performance improvements, the developers are expanding several programs for handling multi-byte characters. The tools cut, nl, unexpand, and expand can now correctly process Unicode texts with emojis or non-Latin scripts. Previously, these tools worked purely byte-based, which could lead to incorrect results with multi-byte encodings.
More options for cut
Furthermore, cut includes three new options: cut -w now separates fields with any whitespace, such as spaces or tabs, instead of a fixed delimiter. This is intended to increase compatibility with FreeBSD and macOS at the same time. cut -O specifies the character that should appear between output fields, and cut -F is an alias for the combination of these two options. This behavior corresponds to the cut implementation in BusyBox and Toybox.
The developers are also expanding date, which can now process dates in dd.mm.yy format with dots as separators. Additionally, cksum --check now handles filenames with unusual characters more securely through more robust quoting. This prevents potential problems with the integrity check of files whose names contain special characters or spaces.
Videos by heise
Rust Coreutils as an alternative
The GNU Coreutils, written in C, form the foundation of every GNU/Linux system. Standard programs such as ls, cp, cat, mv, or wc are among the most frequently used tools on the command line. An alternative is the Rust reimplementation uutils coreutils, which has also recently shown performance leaps and is now 96 percent compatible with the GNU tool collection. Some Linux distributions already include the Rust counterpart by default, such as Ubuntu.
Overall, the update brings nearly 30 changes and bug fixes. All changes in GNU Coreutils 9.11 can be found in the changelog.
(sfe)