Local speech recognition for Windows – without cloud

Local AI features no longer just on Copilot+ PCs: Microsoft brings Windows AI APIs to CPUs and GPUs – plus new speech recognition.

listen Print view
A man speaks into a microphone, notes fly out of his mouth.

(Image: Moritz Förster / KI / iX)

3 min. read
Contents

At its Build 2026 developer conference, Microsoft unveiled several innovations for the Windows AI APIs. The focus is on a new API for local speech recognition and broader hardware support. The aim is to enable AI functions to run on significantly more Windows 11 systems in the future. Microsoft says this will make it easier for developers to use local AI and reduce dependence on the cloud.

The Windows AI APIs provide developers with pre-built AI functions that run locally on the device. Developers do not need to find, run, or optimize a suitable model themselves.

Previous functions include text recognition (OCR), image description, upscaling images using Super Resolution, object recognition, and removing image content. The technical basis is Windows ML and the models provided by Microsoft.

A new speech recognition API has been added, which converts speech to text locally. It supports both real-time and batch transcriptions and processes input via microphone, as an audio stream, or from audio files. Processing runs directly on the device and does not require an internet connection.

According to Microsoft, the API is suitable for dictation functions, automatic subtitles, transcription tools, and accessibility applications, among others. However, the public preview initially only supports the English language. A gradual expansion to other languages and countries has already been announced.

Videos by heise

In addition to the new interface, Microsoft is expanding the hardware support for the Windows AI APIs. Previously, many functions ran primarily on Copilot+ PCs with a Neural Processing Unit (NPU). In the future, the APIs will increasingly run on conventional processors and graphics cards as well.

For example, the new Speech Recognition API supports both NPUs and CPUs. The Windows speech model, previously used for text functions, will also run on suitable dedicated GPUs, and Video Super Resolution will also be used on CPUs in the future.

This primarily increases the potential user base for developers. Applications with local AI functions can be used on significantly more Windows 11 devices without requiring special AI hardware. This is likely to facilitate the adoption of local AI applications, especially on desktop PCs and workstations with powerful graphics cards.

Microsoft also points out that the models behind the Windows AI APIs do not automatically migrate to every device. Instead, Windows only downloads them when an application actually requests them. This keeps the demand for storage space and bandwidth low.

(fo)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.