OpenSharing aims to break down proprietary data silos in the AI world

Databricks hands over OpenSharing to the Linux Foundation. The protocol aims to standardize the exchange of AI models, agent skills, and data.

listen Print view

(Image: Gorodenkoff / Shutterstock.com)

4 min. read
Contents

With OpenSharing, Databricks has introduced an open protocol designed to standardize the secure exchange of data and AI assets such as models, agent skills, and unstructured data across platform, cloud, and organizational boundaries. The project is now hosted by the Linux Foundation as an open-source community project and is available on GitHub.

OpenSharing builds on Delta Sharing, an open-source protocol for secure data exchange, first introduced by Databricks in 2021. While Delta Sharing focused on structured data in table formats like Delta Lake, OpenSharing significantly expands the supported spectrum of data and formats: In addition to tabular data, AI model artifacts, agent skills – i.e., functions and tools for autonomous agents – as well as unstructured data such as documents or media files can now be shared via a unified protocol. The protocol also adheres to the zero-copy principle: data is not replicated; clients access the source storage directly.

data2day 2026: Conference for Data Scientists, Data Engineers, and Data Teams

From October 7 to 8, 2026, data2day in Cologne will offer a comprehensive program on Data Science, Data Engineering, and Data Analytics. A special focus will be on Agentic AI and Analytics, modern data architectures, legal aspects, and insights into corporate practice.

Tickets are now available at an early bird price.

Technically, OpenSharing defines standardized APIs for discovery, authorization, and access. According to the project managers, users can thus address a uniform set of interfaces, regardless of the underlying platform. The specific authentication mechanisms – for example, whether OAuth2 or OIDC are used – are not documented in detail in the previous publications. However, the complete specification is to be made available via the GitHub repository. From the Delta Sharing architecture, it is known that a sharing server acts as a control plane, and actual data access occurs via pre-signed URLs to cloud or object storage.

A significant innovation compared to Delta Sharing is the support for Apache Iceberg clients. This allows providers to serve both Delta- and Iceberg-based recipients via a single protocol. Operators of Lakehouse architectures thus benefit from reduced fragmentation in the open data ecosystem: engines like Spark, Trino, or Flink with Iceberg support gain a standardized access path to shared assets without having to rely on proprietary adapters.

Videos by heise

The Linux Foundation provides vendor-neutral governance structures for OpenSharing. According to Jim Zemlin, CEO of the Linux Foundation, OpenSharing is intended to fulfill the “critical need for a common, vendor-neutral framework that enables organizations to exchange AI assets securely and interoperably across platforms and ecosystems.” The project thus joins other infrastructure standards under the Linux Foundation umbrella, where neutral governance is intended to ensure broader acceptance, including Kubernetes, RISC-V, and MCP (the latter via the Agentic AI Foundation, a foundation within the Linux Foundation).

According to Databricks co-founder and CTO Matei Zaharia, Delta Sharing has already proven that the industry prefers open standards. OpenSharing will extend this principle to the entire AI stack and the cross-platform ecosystem.

Companies with strict data protection and sovereignty requirements – for example, in regulated industries such as European banking, healthcare, or public administration – are likely to be interested in OpenSharing. Due to the zero-copy principle, data remains physically in the existing storage environment, whether it is a private data center or a European cloud. Cloud-based AI services access it via the protocol without data needing to be moved. This facilitates compliance with GDPR requirements and data minimization approaches, as separate copies no longer need to be created for all parties in every case.

Overview of the OpenSharing ecosystem

(Image: OpenSharing-IO)

Numerous companies are already positioning themselves as supporters at the project launch. Atlassian has introduced Data Shares in Atlassian Analytics and uses OpenSharing to enable access to cloud data at scale. SAP relies on the protocol in its Business Data Cloud, Stripe is integrating it natively into the Stripe Data Pipeline, and the London Stock Exchange Group (LSEG) is incorporating it into its “LSEG Everywhere” strategy.

The fact that SAP, a key European software provider, is adopting the protocol early on and that storage manufacturers like NetApp and HPE – with a strong presence in European data centers – have also announced their support underscores the focus on regulated on-premise scenarios. OpenSharing is thus positioning itself as an open alternative to the proprietary data marketplaces of the major hyperscalers.

(map)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.