Apple provides more details on MLX – including the Neural Accelerator in the M5

Apple's notebooks, desktops, and workstations are well-suited for running local AI systems. The key to this is the MLX software.

MLX Website: AI models close to hardware under Apple Silicon.

(Image: Apple)

Nov 24, 2025 at 3:47 pm CET

3 min. read

Mac & i

By

Ben Schwan

Apple has published information on its machine learning website, more information about its MLX AI framework and the use of the integrated AI accelerator (Neural Accelerator) in the M5 processor. This is particularly interesting for users who want to run local AI systems such as large language models (LLMs), which is increasingly becoming a trend. Most recently, it was demonstrated how the large Chinese model Kimi K2 Thinking on a Mac Studio cluster with four 512 GB RAM workstations networked via Thunderbolt 5 could be executed. However, smaller configurations such as a MacBook Pro M3 Max with 128 GB RAM can also easily run medium-sized models like gpt-oss-120b from OpenAI. MLX variants of the models provide the LLM with an additional boost.

TB5 Networking and New Neural Accelerator

“With MLX, users can efficiently explore and run LLMs on the Mac. It allows researchers to experiment with new inference or fine-tuning techniques or test AI techniques in a private environment on their hardware. MLX works with all Apple Silicon systems,” says Apple.

With macOS 26.2 Beta, which is currently being tested, support for latency-free Thunderbolt 5 networking and the aforementioned neural accelerators, integrated into the 14-inch MacBook Pro M5, is now included. The latter are intended to help accelerate certain machine learning workloads and also speed up the execution of AI algorithms (inference).

Videos by heise

Waiting for M5 Pro, M5 Max, and M5 Ultra

Since there are currently no machines with M5 Pro, M5 Max, or even M5 Ultra, and the M5 only addresses a maximum of 32 GB RAM, M4 Max or M3 Ultra might be the better choice at the moment. However, according to Apple, models that fit into RAM show a significantly faster “Time to First Token,” i.e., the time required to output the first token. They range from 3.3 times (gpt-oss-20b-MXFP4-Q4) to 4 times (Qwen3-8B-MLX-4bit).

Apple also provides tips in its documentation on how to work with MLX. Those interested in further details can find the MLX-LM project on GitHub for calling various models and finetuning. Interested parties can find tips and tricks in their own MLX community on Hugging Face. In tools like LM Studio, you can quickly find MLX variants of well-known models.

Empfohlener redaktioneller Inhalt

Mit Ihrer Zustimmung wird hier ein externer Preisvergleich (heise Preisvergleich) geladen.

Preisvergleiche immer laden

Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (heise Preisvergleich) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.