Project Astra: Gemini Live in 45 languages and live screen sharing
As of this month, you can talk to Gemini about what you see. The AI assistant now speaks 45 languages.
(Image: YueStock/Shutterstock.com)
Google is showcasing two new functions for Gemini at this year's MWC in Barcelona. Specifically, it's about Gemini Live, the AI assistant that you can talk to in real time via the Android and iOS app. This will be updated to Google's latest Gemini model – Gemini 2.0 Flash. This is the version of the multimodal model that is particularly tailored to fast, mobile use.
With the update, the Gemini app can understand and speak 45 languages. A new feature is that you can change the language in the middle of a sentence. According to Google, you no longer need to change the phone's language settings – simply continue speaking in a different language and "Gemini Live will be able to understand and respond". This function will be available immediately.
Later this month, Google says that live video input is coming. This is one of the key functions of Project Astra. Google presented this at the last in-house I/O trade fair. In one video, someone wearing smart glasses walked through a room and talked to the AI assistant about what they were seeing. However, according to Google, Project Astra is a "research prototype for a universal AI assistant". The live video feed is now initially being fed into the smartphone in the form of the app. Gemini also remembers what users have discussed with it so that it can be used again later.
In addition to video input, screen sharing will also be available in future. This will make it possible to talk to Gemini Live about what is shown on the phone. Google writes in a press release that you will be able to buy a new pair of jeans, for example.
The visual AI functions will initially only be available for Pixel and Samsung devices.
Videos by heise
AI assistants and smart glasses
Making AI assistants smarter and more practical is currently the aim of all major AI providers. OpenAI, for example, offers an AI agent called Operator, which can also be told in natural language to buy a pair of jeans. It also needs Advanced Voice Mode to do this. However, when announcing the voice mode, OpenAI also announced that it would have visual capabilities, as Google has now made available. However, OpenAI has not yet published these.
Meta offers a visual AI assistant primarily with its smart glasses, the Ray-Ban Meta Glasses. These allow you to look at your surroundings and ask questions directly. However, Meta AI – is not yet responsible for processing in the EU.
(emw)