Nvidia Lyra 2.0 generates persistent 3D environments from images

Nvidia has introduced Lyra 2.0, a framework that generates 3D environments from images, enabling stable exploration of larger spaces.

listen Print view
AI-generated coastal city with narrow streets, cafes, and historic buildings; view of the sea with boats in warm evening light.

Lyra 2.0 first generates a video from an image, and then creates an explorable 3D environment from it.

(Image: Nvidia)

3 min. read

With Lyra 2.0, Nvidia introduces an AI framework that addresses fundamental problems in generating 3D environments. World models like Google Genie 3 can today generate explorable and sometimes interactive 3D environments from simple text prompts and images. It becomes problematic during longer explorations: generative systems quickly lose track of previously generated areas, so they do not remain persistent. For this reason, exploration of the 3D environments is often spatially or temporally limited.

With Lyra 2.0, the researchers present approaches that tackle these challenges. On the one hand, “spatial forgetting,” where previously seen areas fall out of the model's temporal context and are hallucinated upon re-examination. On the other hand, “temporal drift,” where small errors accumulate during generation, increasingly distorting the scene's geometry over time.

Videos by heise

The new framework aims to curb these phenomena by storing 3D geometry for each image to restore previous spatial relationships and by precisely correcting temporal errors through self-augmented AI training. In contrast to its predecessor, Lyra 1.0, released in September 2025, Lyra 2.0 thus enables persistent 3D environments over longer exploration paths.

On the project page, which also includes video examples, the Nvidia researchers describe Lyra 2.0's workflow. Starting from an image, a 3D point cloud of the scene is first generated. The user can then explore the generated scene using a GUI and define camera paths that also lead to areas not yet visible. Along these paths, the model first generates suitable video sequences, which are then converted back into 3D point clouds. This allows the scene to be gradually expanded and refined. Nvidia's methods help ensure that already-generated areas remain consistent even over many expansion steps.

Empfohlener redaktioneller Inhalt

Mit Ihrer Zustimmung wird hier ein externes Video (TargetVideo GmbH) geladen.

Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (TargetVideo GmbH) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.

Finally, Lyra 2.0 exports the result as a Gaussian Splatting model and as classic mesh graphics. These can be imported into Nvidia's robotics simulation platform, Isaac Sim, where AI models can learn to navigate through the generated environments. The researchers also see potential for interactive exploration on screen or in virtual reality, as well as for use in general simulations.

Nvidia has released Lyra 2.0 as a research project via a GitHub repository and via the model platform Hugging Face. Those who want to experiment with AI generation of 3D environments without prior knowledge can use offerings like Marble from World Labs or the aforementioned Project Genie from Google. However, the latter is currently only available in the USA.

(olb)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.