Google: New AI model also runs on laptops with only 16GB RAM

Google releases Gemma 4 12B: The local open-source model already runs on laptops with only 16 GByte RAM.

Google's AI lab is constantly bringing new models to market.

(Image: Primakov/Shutterstock)

Jun 8, 2026 at 10:39 am CEST

2 min. read

By

Carolin Riethmüller

Google DeepMind has introduced Gemma 4 12B, a new open AI model that is intended to enable multimodal agents directly on standard notebooks. The model with 12 billion parameters processes text, images, and, as the first model of this size, also audio natively – and only requires 16 GByte of working or graphics memory for this. Released under the Apache 2.0 license, it is freely available to developers and companies.

This lowers the entry barrier for Google's local AI agents. While Google's own on-device AI Gemini Intelligence, on Android smartphones has high hardware requirements, Gemma 4 12B deliberately targets the public.

Architecture without separate encoders

A second strength of the model lies in its unified architecture. As Google explains in its blog, Gemma 4 12B completely omits separate vision and audio encoders. Conventional multimodal models from Google typically use their encoder modules that translate images and audio data before the language model processes them. Gemma 4 12B takes a different approach: here, the input is to be processed directly by the LLM backbone.

Videos by heise

Performance close to the twice-as-large model

Within the Gemma 4 family, Google positions the 12B model between the edge variants E4B, designed for smartphones and IoT devices like Raspberry Pi, and the larger 26B Mixture-of-Experts (MoE) model. In benchmarks, however, it is said to be, according to Google, only slightly behind the more powerful model. Without a dedicated GPU, however, inference times are likely to increase.

How the new model compares to 16 GB variants from other providers remains to be seen.