Google I/O: Gemini Flash, Nano, 1.5 Pro, Gemma, Project Astra, Live, what?!

Google threw the Geminis around at this year's I/O. Why this is difficult and who or what they all are.

The roof of the Google I/O event hall.

(Image: Google Press Kit)

May 15, 2024 at 1:08 pm CEST

8 min. read

By

Eva-Maria Weiß

Google I/O: Gemini Flash, Nano, 1.5 Pro, Gemma, Project Astra, Live, what?!

This article was originally published in German and has been automatically translated.

They are called Flash, Nano, Ultra and are available in Advanced or free of charge and in countless forms - Google has countless AI products in its portfolio. Most of them are called Gemini with some kind of suffix. But while it may be easy for Google employees and very deeply involved people to follow the innovations and presentations at this year's Google I/O, the head of the average AI user quickly goes boom and drops out. Google is not doing itself any favors. That's a shame, because the products are good!

It was a clear affront that OpenAI held its newly chosen Spring Update the day before Google I/O, a live event streamed from the office according to its own statements. It was clear that Google had already decided on the entire I/O schedule at that point. Whether or not they still jerked things around was minimal at most. But unfortunately, OpenAI did something better than Google. And these are not necessarily products and AI applications or models, Google can hardly be pushed down from its primus role. But sometimes there is beauty in simplicity - or success.

Read also

Google I/O: Video-KI, Suche-KI und noch mehr KI

ChatGPT versus Gemini Live, Project Astra and Google's search

With GPT-4o, OpenAI has demonstrated an omnimodel that can natively process text, audio and vision at the same time. That's pretty easy to understand. But what's even easier to understand is that they simply showed various examples of the use of GPT-4o in ChatGPT for a very long time during the presentation. These examples are the ones that catch a lot of people. So you'll soon be able to point your smartphone camera at something and ask a question? That's cool. It's not even available in this form yet - but it's falling a bit by the wayside. The whole world is shouting that OpenAI shows the future.

It's certainly not just because of OpenAI's one-day head start that Google is a little behind in offering the same product in the future. It will even be integrated into several services. And that's where the salad starts: Gemini Live, the future search that has AI in it, Project Astra - all of these somehow include exactly this ability to hold the camera on something and talk to an AI assistant and ask questions at the same time. But is it the same everywhere? Somehow not and somehow it is. To clear things up a little, below is a list of Google products and an explanation of what they are.

This is not the first time Google has had difficulties with the naming of products. It was similar with the communication tools from Allo to Meet. AI is now simply more deeply rooted, everywhere, which explains why it might not be easy to keep the naming simple. But: on the one hand, Gemini is a Large Language Model (LLM), which exists in different versions and with different names. But Gemini is also the chatbot that can be accessed via this very URL and is available as an app. Naturally, this causes confusion.

The flood of information and products is simply too much. Google's I/O is a package that is too full, bursts open on the letter carrier's way and unfortunately does not reach the recipient in full. But the content would be so good!

An AI search is no substitute for a search

There is the search. There were rumors that OpenAI would launch a pure AI search. This was not the case. And it probably wouldn't be wise to do so. The majority of search queries are for companies or stores, for example, where people simply want to get to the homepage - instead of getting some AI-summarized information about the company or store. Google knows and understands this. Google also knows that their strength is the gigantic knowledge base they have built up, for example on locations, events, opening hours, transportation options and pure information. This cannot be replaced by a voice or omnimodel, it can only be supplemented. Liz Reid, Head of Search at Google, explains that they will use both: the knowledge base and AI. This is also based on the principle that the devil is always in the biggest pile. For the user, Google should clearly assert itself as the top dog.

Did anyone who watched I/O realize this? Between all the flashes and photos and lives and tokens, probably not. I/O is originally a developer conference. Those who are deeply immersed in the subject matter can follow along and perhaps pick out their own content cherries. Unfortunately, the event is difficult to reach the average AI user and the general public in this form.

The Gemini ABC:

Gemini: Google's multimodal language model, which is also the name of the AI chatbot previously known as Bard.

Gemini 1.5 Pro: Google's most powerful Gemini model, which offers a particularly large context window - with one million tokens and now two million tokens for developers.

Gemini Nano: The smallest and most efficient version of the AI model, optimized for mobile devices.

GeminiUltra: The heavyweight of the Gemini AI model family, which can take on particularly complex tasks.

Gemini1.5 Flash: An AI model trained on the basis of Gemini 1.5 Pro, which is particularly fast and cost-effective.

Gemini: An AI chatbot that is available as an app and on the web.

Gemini Live: Google's future idea of an AI assistant or search, which allows you to use both the camera and voice to ask Gemini questions.

Project Astra: Behind this is the work on the AI assistant, which is also referred to as agents that act on behalf of users.

Ask Photos with Gemini: The function that allows you to ask for information in the Photos app - in natural language.

Veo: Google's video AI, where videos can be created using a prompt.

Imagen: Google's image generator.

Lyria: Google's music AI.

Music AI Sandbox: Includes all AI applications that have to do with music. The term Privacy Sandbox stands for the replacement of third-party cookies in the Chrome browser, which is making slow progress. Surprisingly, they use Sandbox in other ways too.

Ask with Video: Camera and voice usage in search, which basically works like Gemini Live and Project Astra.

Gemini Side Panel: The AI assistant that has already moved into Google Workspaces, similar to Microsoft Copilot.

Med-Gemini: Google's multimodal model for medical applications.

Gems: Personalized chatbots.

AI Overview: The AI-generated summaries for a search query that appears in the top pane.

Gemma: Google's open source models.

PaliGemma: An open vision language model.

SynthID: Google's work on a digital watermark.

Trillium: The sixth generation of TPUs (tenSor Processing Units) - special AI chips.

(emw)

Back to top

Alle Angebote

Newsletter heise-Bot Push Push-Nachrichten

${intro} ${title}