LM Studio allows testing of local DeepSeek models on Apple Silicon Macs

The hype surrounding the Chinese language model DeepSeek is huge. If you don't want to try it out via the web or app, you can also do it locally with LM Studio.

listen Print view
Dive deep with DeepSeek: Illustration of the logo

Dive deep with DeepSeek: This also works locally on the Mac.

(Image: durch Mac & i mit Midjourney erstellt)

3 min. read

Large language models do not always have to be used on the server of a large company such as OpenAI (ChatGPT), Anthropic (Claude) or –, which recently launched – DeepSeek (R1). Slimmed-down versions, which are created from performance-hungry server models by means of distillation, also run locally. On Macs, this is particularly easy with the free LM Studio app. It is suitable for Apple Silicon machines.

In contrast to more professional approaches such as Ollama, LM Studio does not require you to go to the command line. However, it is possible to run a local server if desired, but you don't have to. The app integrates the use of LLM into its own interface, which combines the discovery of models, installation and use. All known open source models are available, including Llama, DeepSeek, Qwen and Mistral. You can choose whether the models are optimized for Apple's MLX format, which makes better use of Apple Silicon's unified memory.

In order for the models to run properly, you need a computer with sufficient power and RAM – as well as storage space. The model sizes that need to be downloaded start at 4 GB, but can also be 40 GB or more. The output is of quite different quality, as a short test showed. Smaller models tended to hallucinate more than larger ones, with the output coming at different speeds. The open source version of DeepSeek is also censored by the Chinese government, so it may not talk about the massacre in Tiananmen Square in 1989, for example, although there are modified models that circumvent this.

Videos by heise

In a test with a large DeepSeek model, we achieved by far the best output. This was a distill variant based on R1 with Llama 70B including quantization. It also shows the reasoning process (you have to click on "Thinking"), i.e. how the model comes up with its "thoughts". The waiting time was 20 to 40 seconds, with the fan of our M3 machine starting up more often. This is not necessarily the case with smaller models – this one alone had 40 GByte –.

When experimenting with the models, you should make sure you have enough SSD space. With the model sizes, this can quickly overflow. The models are stored in the "models" directory under ".lmstudio" in the user's home directory, but can be easily managed and deleted via the LM Studio GUI. The app is also available for sufficiently fast x86 Windows computers, ARM Windows machines and Linux.

Empfohlener redaktioneller Inhalt

Mit Ihrer Zustimmung wird hier ein externer Preisvergleich (heise Preisvergleich) geladen.

Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (heise Preisvergleich) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.

(bsc)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.