Google DeepMind: Two Gemini AI models for more intelligent, useful robots
The two AI models Gemini Robotics and Gemini Robotics-ER are designed to give robots a better understanding of their environment and make them more intelligent.
(Image: Google DeepMind)
Google DeepMind has introduced two AI models for robots, as the AI company announced on Wednesday. These are Gemini Robotics, an AI model for robotics based on Gemini 2.0, and Gemini Robotics-ER, an AI model that provides robots with an enhanced spatial understanding. The AI models are designed to enable robots to solve tasks precisely, even if they have not been trained to do so.
Gemini Robotics for a general understanding of the world
Gemini Robotics is based on Google's general AI model Gemini 2.0. It builds on Gemini's multimodal understanding of the world and transfers it to the real world. Physical actions are added as a new modality.
Specifically, this means, among other things, that robots can understand and respond to a much wider range of natural language commands than was the case with previous models.
Empfohlener redaktioneller Inhalt
Mit Ihrer Zustimmung wird hier ein externes YouTube-Video (Google Ireland Limited) geladen.
Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (Google Ireland Limited) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.
In addition, the robot can adapt its behavior to the user's input. Robots can thus constantly monitor their environment, detect changes, and adapt their actions accordingly. Gemini Robotics uses Gemini's general understanding of the world to generalize new situations. This helps the robot to solve tasks for which it has not been trained. For example, it can handle new objects and deal with new instructions in previously unknown environments. This should enable humans to better control and monitor the robot. This can be the case, for example, in industrial environments as well as in domestic environments, depending on where the robot is to be used.
In addition to adaptability, an understanding of changes in the environment and the ability to react to them, a robot must also have a certain level of dexterity. Gemini Robotics provides a robot with the ability to manipulate objects precisely to perform even very complex, multistep tasks.
The Gemini Robotics AI model should be able to run on different robots of various designs. This could be a two-armed robot such as Google's own two-armed robot platform ALOHA 2, but also humanoid robots such as Apptronik's Apollo. On more complex robot systems, however, Gemini Robotics must be adapted somewhat so that the robots can also perform more difficult tasks.
Gemini Robotics-ER for spatial understanding
The AI model Gemini Robotics-ER (ER – Embodied Reasoning) improves robots' understanding of the world and, in particular, their spatial understanding. The model combines spatial understanding with the control of the robot. For example, it can recognize objects lying on the table and also knows where they are. From this, the robot can deduce how to grasp the object, how best to move its arm and whether this can be done safely.
Videos by heise
Google DeepMind is already working with various robotics companies. At the forefront is Apptronik with its humanoid robot Apollo. Agile Robotics, Agility Robotics, Boston Dynamics and Enchanted Tools are among the so-called “trusted testers”. They are provided access to Gemini Robotics-ER.
(olb)