Database Dolt 2.0: Branches and Commits for SQL Data
With version 2.0, the open-source database Dolt brings vector data, adaptive storage for large values, and a new storage foundation.
(Image: Dolt)
With Dolt 2.0, many things are changing in the storage engine of the versioned SQL database. Several central functions are now active by default for the first time: automatic garbage collection and a new archive format for historical data. In addition, there is beta support for vector data and a new procedure called Adaptive Storage for data types such as JSON or BLOBs. According to the developers, Dolt also now achieves better Sysbench values than MySQL.
The open-source project Dolt is a relational database with version control modeled after Git. Developers can commit, branch, merge databases, and compare differences between data states. Technically, the project combines a MySQL-compatible SQL layer with its own versioned storage engine. Dolt is used for collaborative data maintenance, reproducible datasets, auditing, or data engineering workflows, among other things.
Automatic Cleanup
The now default garbage collection removes unreferenced data blocks in the background and is intended to simplify the operation of large or heavily branched databases. Especially with versioned databases, storage requirements grow quickly, as Dolt historizes every change. Until now, administrators often had to initiate cleanup manually or via a scheduled job. An Online Garbage Collection, which also works during operation, had already been introduced by the project previously.
The new archive format is also active by default. It stores historical data states more compactly, thus reducing storage requirements. During development, the project had mentioned savings of up to 50 percent. The format primarily targets databases with many snapshots or long change histories and is intended to facilitate long-term archiving and cold storage scenarios.
Videos by heise
Faster than MySQL
In terms of performance, DoltHub refers to its own Sysbench results, in which Dolt is overall slightly ahead of MySQL. Sysbench is a common benchmark tool for OLTP workloads and measures typical database operations such as inserts, updates, and reads. According to the documentation, Dolt performs about 10 percent faster than MySQL in write operations, while read accesses are still slightly slower at around 5 percent. Performance was previously considered one of the biggest hurdles for versioned databases.
Also new is the beta support for vector data. These are numerical embeddings, such as those used by AI applications for semantic search or retrieval systems. Comparable functions are now also offered by PostgreSQL extensions like pgvector or specialized vector databases. However, the release notes explicitly still speak of a beta status.
Adaptive Storage of Large Values
With adaptive storage, Dolt also introduces a new storage strategy for the TEXT, JSON, GEOMETRY, and BLOB types. The engine dynamically encodes large contents differently or outsources them. The developers compare the mechanism to TOAST from PostgreSQL, which also automatically stores large values outside the actual table rows. The goal is lower storage consumption and more efficient I/O access, for example with extensive JSON documents or binary data.
Details on all changes can be found in the release notes on GitHub. According to the developers, Dolt 2.0 remains fundamentally compatible with 1.x databases. However, not all databases created with 2.x can be read by older 1.x clients. Those operating mixed deployments or planning rollbacks should consider this during migration.
(fo)