FFmpeg 8.0 integrates Whisper: Local audio transcription without the cloud

The upcoming version 8.0 of FFmpeg will have Whisper as an option. OpenAI's AI transcription will be flexible to use.

listen Print view
Colorful,Audio,Waveform,On,Virtual,Human,Background,,represent,Digital,Equalizer

(Image: whiteMocca/Shutterstock.com)

1 min. read

FFmpeg, the widely used multimedia framework, is integrating Whisper: the new function allows users to automatically transcribe their audio content directly within FFmpeg. The speech recognition system based on machine learning comes from OpenAI. The new feature is part of FFmpeg 8.0, which is due to be released in the coming weeks.

The new whisper filter in FFmpeg works locally and does not transfer content to the cloud. The whisper.cpp library is required, then an --enable-whisper activates the feature. By default, the software automatically recognizes the language; Whisper can transcribe audio recordings in over 90 languages. However, if in doubt, a language can be specified; the same applies to the use of a GPU, which is activated by default.

If desired, the new filter can also create SRT files for videos or transcribe the audio for live broadcasts. Furthermore, the information transferred via Whisper can be used in FFmpeg or passed on to other applications in an automated workflow. Until now, users and developers had to use several tools for such functions, which made integration more difficult.

Videos by heise

FFmpeg appears as open source software; the same applies to Whisper. A first insight into the integration can be found here.

(fo)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.