Volumetric videos overcome a major hurdle on the path to the mainstream
Dynamic Gaussian splats can now be streamed in good quality to mobile devices and VR headsets. This makes volumetric videos more mainstream.
Volumetric videos offer hologram-like representations with a freely selectable viewing angle.
(Image: Tomislav Bezmalinović / heise medien)
Volumetric videos show people and objects as three-dimensional bodies that can be viewed from any chosen perspective. Thanks to advances in the optimization and transmission of dynamic Gaussian splats, they can now be streamed in good quality to mobile devices and even to standalone VR headsets, provided there is a sufficiently fast and stable internet connection.
The start-up Gracia AI is one of the pioneers of this technology and has released three demos this month that allow you to try out the streaming of dynamic Gaussian splats. Dynamic here means that the Gaussian splats represent a temporal sequence and movement, not just a snapshot.
Gaussian Splatting is a method for 3D reconstruction and representation. As a representation approach, it fundamentally differs from classic 3D graphics. Instead of building objects from meshes of connected polygons, usually triangles, a scene is described as a dense collection of small, spatially extended points. These so-called splats each carry information such as position, size, orientation, color, and transparency. In a dense form, these visual atoms create extremely realistic-looking people, objects, and environments.
The approach plays to its strengths particularly with complex and fine structures: hair, smoke, or other forms that are difficult to model can be captured much more naturally this way. At the same time, softer transitions and overall more coherent image impressions are created, which are often only achievable with classical polygon meshes at considerable additional effort.
Gaussian Splatting was originally primarily a method for 3D reconstruction of the real world. Static motifs can be captured relatively quickly and easily with standard cameras, such as those on smartphones. Compared to other digitization methods like photogrammetry and NeRFs, the resulting representations are significantly more efficient and can now be rendered in real-time even on mobile devices.
Meanwhile, Gaussian Splatting is increasingly detaching itself from the original reconstruction context. In addition to AI-assisted generation of 3D environments (Google's Project Genie) and applications like immersive telepresence (Apple's improved Personas), the technology is also being used in film production and, in the future, likely in game development.
The examples demonstrate three typical application scenarios from entertainment, crafts, and medicine: a four-minute musical performance, as well as short excerpts from a road bike repair guide and a physiotherapy session. A WebGPU-enabled browser like Google Chrome is required for correct playback. The volumetric videos start without pre-installing an app or long buffering times, allowing users to view the scenes from any angle and zoom in and out.
Videos by heise
In a VR headset, the added value of volumetric videos increases: people and objects appear in your own living room and develop a physical presence. Position and size can be adjusted by hand movements: from huge to life-size to miniature on the table. We tried out the function in the WebXR-enabled browser of the Meta Quest 3. Streaming also works on Apple Vision Pro, but without a passthrough view, as Apple has not yet enabled this function for WebXR.
How to stream dynamic Gaussian splats?
Streaming dynamic Gaussian splats in the browser is not an entirely new development; the Chinese start-up 4DV already demonstrated corresponding approaches in 2025. However, their scenes still overwhelm some devices.
Gracia AI recommends a bandwidth of relatively high 80 Mbit/s for its streaming technology, which corresponds to a representation of 120,000 splats per frame. According to CEO Georgii Vysotskii, this is the maximum bitrate of the current streaming configuration. However, in many cases, less bandwidth is sufficient, depending on the complexity of the scene. What matters is how much movement it contains.
Dynamic Gaussian splatting fundamentally differs from conventional videos: the displayed content does not consist of pre-rendered image sequences but of an accumulation of spatially extended 3D points that are rendered in real-time on the local device. In the proprietary streaming technology developed by Gracia AI, not finished images are transmitted, but keyframes and motion data of this 3D representation.
Empfohlener redaktioneller Inhalt
Mit Ihrer Zustimmung wird hier ein externes Video (TargetVideo GmbH) geladen.
Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (TargetVideo GmbH) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.
The advantage of this approach is obvious: instead of transmitting all 3D points for every point in time, only the changes between them are encoded. Since many parts of a scene change little over time, this saves considerable bandwidth. “It's essentially a codec-like approach for 2D video, applied to Gaussian splatting,” explains Vysotskii. Compared to downloaded versions of volumetric videos, the data rate is more than ten times lower, with almost the same visual quality.
The start-up's production pipeline allows different bitrates and quality levels to be set depending on the application and output device, for example, for streaming or download, and for mobile devices or more powerful stationary computers that can render more splats simultaneously. Specifically for streaming, the start-up has also tested a configuration with 17 Mbit/s or 15,000 displayed splats per frame, which is suitable for volumetric recordings with little movement. This bitrate is in the range of typical 4K video streaming.
Gaussian Splats: From local computation to streamed medium
Volumetric videos have long been considered a core promise of immersive technologies. They are associated with the vision that recordings of people, objects, and scenes are no longer limited to screens and canvases, but appear as holograms that can be freely placed in space and seem tangible.
Serious attempts in this direction were already made about ten years ago, parallel to the emergence of the first mass-market VR headsets: companies like Microsoft and Intel experimented with volumetric video formats and built their own recording studios for this purpose. In Germany, too, Volucap in Babelsberg was an early professional infrastructure for volumetric recordings. Although technically impressive, the approaches have so far failed due to high production costs, enormous data volumes, and a lack of distribution channels.
Volumetric videos are not videos in the classic sense, as they do not consist of fixed image sequences but of three-dimensional content rendered in real-time. In this respect, they resemble video games.
Most volumetric videos are based on recordings of real people, objects, and scenes. Their interactivity is usually limited to choosing the perspective and distance. Until now, such recordings were mainly made in specialized studios where a multitude of cameras capture a subject simultaneously from different angles. There are different approaches to processing and displaying this data: the latest and most promising is Gaussian Splatting.
There are static and dynamic Gaussian splats, but only dynamic Gaussian splats represent a temporal sequence and movement, thus qualifying as volumetric videos. For clear distinction, static Gaussian splats are also referred to as "3DGS" and dynamic Gaussian splats as "4DGS" (time as the fourth dimension).
Volumetric videos, whether based on Gaussian Splatting or other 3D reconstruction and representation methods, fundamentally differ from other immersive video formats, which offer significantly fewer degrees of freedom but are also much easier to produce.
Stereoscopic videos (also called "Spatial Videos" by Apple) offer slightly offset perspectives for both eyes, thus creating a 3D impression, but are limited to a mostly rectangular image format and a fixed perspective. So-called 180- and 360-degree videos expand the field of view, but do not change the fixed perspective. Approaches that calculate new viewing angles from videos using artificial intelligence are currently still highly limited, as they have to "invent" missing image information. Volumetric videos are thus considered the most powerful form of immersive video formats, but are still complex to produce.
In the future, volumetric videos are likely to be increasingly generated synthetically, for example through AI generation. The boundaries between volumetric video, interactive formats, and video games are likely to become increasingly blurred.
Gaussian Splatting solves some of these problems. For one, the technology significantly reduces capture costs. According to Vysotskii, the number of cameras required is constantly decreasing. Moreover, high-quality recordings are increasingly possible with comparatively inexpensive smartphone cameras or GoPros. Secondly, production is shifting from professional volumetric studios with dozens of cameras to portable camera rigs that capture a smaller field of view. In such cases, ten iPhones are sufficient, according to Vysotskii.
The technology and capture quality have also significantly improved. “The capture process is far less restrictive than it used to be in the days of mesh capture and mesh processing. Gaussian splatting is much more flexible in terms of camera placement, much more tolerant of different lighting environments, and far better when it comes to fabrics,” says the CEO.
(Image:Â Gracia AI)
This is also thanks to the rapid development of Gaussian Splatting in recent years. Initially, static splats could only be displayed in acceptable quality on powerful computers. With ongoing optimization, however, the process became more efficient and eventually reached mobile devices. In parallel, dynamic Gaussian splats emerged, which initially also required high computing power. However, in a short time, these moving representations were accelerated to the point where they could be used on weaker hardware. The streaming of dynamic Gaussian splats is now the next step: this turns a technology that was previously locally bound into a broadly accessible medium for the first time.
Volumetric Videos: Many Hurdles Remain
Despite this development, volumetric videos still face numerous hurdles. The biggest ones are that production and processing remain complex: for complete 360-degree recordings, as shown by Gracia AI in its demos, 40 to 60 cameras are still used simultaneously.
Vysotskii and co-founder Andrey Volodin originally founded the start-up with the intention of establishing a YouTube for volumetric videos. The goals are now more pragmatic: the focus is on developing the infrastructure and tools that enable studios to create volumetric content. A dedicated distribution platform is not currently planned.
Gracia AI primarily sees two commercial application areas for the technology: education and entertainment. It is already working with partners in both. In education, the company cites a project with Imperial College London, where manual processes are volumetrically captured for training purposes and prepared for VR or screen use. In entertainment, Gracia points to a project with the theme park PortAventura, where volumetric content is integrated into a location-based VR experience for visitors.
Empfohlener redaktioneller Inhalt
Mit Ihrer Zustimmung wird hier ein externes Video (TargetVideo GmbH) geladen.
Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (TargetVideo GmbH) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.
The start-up also sees great potential in immersive sports broadcasts. For these, Gracia AI is already working on the next big step: live streaming of dynamic Gaussian splats. An announcement is expected soon.
Ultimately, whether volumetric videos will catch on depends on the widespread adoption of immersive computer glasses. As long as these are not everyday products, their use for the public remains limited. Nevertheless, Gaussian Splatting and volumetric videos are among the most interesting developments in this area at present.
Many more volumetric videos are available for download in the start-up's app. It is available for Meta Quest, MacOS, and Steam. An app specifically for Apple Vision Pro is still under development and is scheduled to be released in April according to current plans.
(dahe)