MacWhisper: Local audio transcription distinguishes speakers
MacWhisper is one of the most popular apps for recording conversations on the Mac. Using local AI, it can now also tell people apart.
MacWhisper with Speaker Detection: A long-cherished wish.
(Image: Jordi Bruin)
Software for transcribing conversations, video calls and interviews has made significant progress in recent years. One innovation is that this is also possible locally on the computer – thanks to open-source models from OpenAI and others. The most popular software for this on the Mac is called MacWhisper and comes from Dutch developer Jordi Bruin. He has now fulfilled a long-awaited feature request from his users: It is finally possible to automatically distinguish between speakers. The feature has been available since version 12.0.1, which was released this month.
Who is speaking right now?
“Now when you transcribe an interview, meeting or conversation, MacWhisper automatically recognizes different speakers, groups their statements and labels them – making your transcripts clearer and easier to navigate,” writes Bruin in the package insert. The function was one of the most requested features among users. The fact that transcription continues to run on your Mac, meaning that data does not end up in the cloud (e.g., for training), remains unchanged.
Videos by heise
“All processing takes place privately on your Mac, nothing is sent to a server, and it also works offline.” This was implemented in collaboration with ArgMax and its WhisperKit Pro and SpeakerKit models. These must also be selected accordingly. It is also possible to select a language in advance or have it recognized automatically. In practice, this works particularly well if the conversation only uses one language. If there are several, the result is sometimes word salad.
Server models also available
Speaker recognition is part of MacWhisper Pro, so it is not free to use –. The activation fee is a not entirely inexpensive 59 euros. In return, there is also text and grammar correction via server models, batch transcription and support for distilled models. The Pro version can also transcribe YouTube videos and supports various other cloud models from OpenAI, Anthropic, X.ai and via Ollama. A feature overview can be found here. Bruin gives students, non-profits, and journalists a 30 percent discount if they contact him by email. Most recently, support for ElevenLabs Scribe and Deepgram Nova was also added.
MacWhisper supports over 100 languages. The app can also capture audio from various Mac apps directly, so you don't have to save anything. The hardware requirement is a Mac with an M chip, i.e., Apple Silicon. Updates are included in the price, there is no subscription.
Empfohlener redaktioneller Inhalt
Mit Ihrer Zustimmung wird hier ein externer Preisvergleich (heise Preisvergleich) geladen.
Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (heise Preisvergleich) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.
(bsc)