Genie 3: Google's "world model" builds interactive environments
Google has presented Genie 3, a new "world model". It builds interactive environments with memory that can also be used for robot training.
Interactive game worlds created with Genie 3.
(Image: Google)
Google DeepMind's new “world model” Genie 3 has a memory of several minutes: Interactive worlds created with the AI tool should remain consistent for several minutes, Google explains. In an example video, you can see a character painting color on the wall before turning around. When she looks back, the color is still in the same place.
Empfohlener redaktioneller Inhalt
Mit Ihrer Zustimmung wird hier ein externes YouTube-Video (Google Ireland Limited) geladen.
Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (Google Ireland Limited) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.
This consistency is one of the biggest challenges AI models face when creating 3D worlds. Google's GameNGen model, for example, which can specifically create video game worlds such as that of “Doom,” forgets after just a few seconds where enemies it has lost sight of are and how much ammunition is in the firearm.
Genie 3 helps with robot training
Genie 3 is a big step forward, writes Google DeepMind in a blog entry. Not only does it have a longer memory than previous models, it is also more flexible than GameNGen and its Genie predecessors: Genie 3 creates interactive, dynamic worlds on text prompt. These can also, but not exclusively, be used for video games. Google also mentions the training of robotic AIs, which can also travel through the 3D worlds, as a further application. The 3D worlds of Genie 3 could also be used for virtual firefighting or disaster control exercises.
(Image:Â Google)
Humans can interact with the generated game worlds via a keyboard to steer characters through them in real time. New areas are generated as you move through the 3D world. While exploring a 3D world generated by Genie 3, new elements can be added by prompt.
Videos by heise
The Genie 3 worlds are displayed at 24 frames per second and a maximum resolution of 720p. This is a better resolution than older models such as GameNGen, but lags the non-interactive video tool Veo. It can handle up to 4K videos.
Step towards AGI
According to Google, Genie 3 is an important step towards AGI (Artificial General Intelligence) because the generated worlds can be used to train AI agents in detailed simulation environments. Nevertheless, the model is not yet perfect: the interaction possibilities of AI agents with the simulated worlds are still limited. Furthermore, the interaction between several agents in a shared simulation world still requires additional research.
According to Google, one application scenario for Genie 3 is the exploration of locations from the past. At the same time, Google admits that Genie 3 is currently not capable of reproducing real-world locations with great accuracy. Google also wants to improve the rendering of text in future versions. The aim is also for models such as Genie 3 to enable persistent interaction for hours rather than just several minutes in the future.
Genie 3 will not be released for now. At this time, Google DeepMind only wants to make the “world model” available to selected researchers and creators, whose feedback will help the development team to make further progress.
(dahe)