AI Cluster: Four Macs with 2 TB can run the giant model Kimi K2 Thinking

Apple has integrated the protocol developed by Exo for creating Thunderbolt 5 clusters into macOS. This allows even very large AI models to run.

listen Print view
Person at a Mac Studio

Person at a Mac Studio: It doesn't get much more powerful in terms of AI.

(Image: Apple)

3 min. read

After the small British company Exo Labs demonstrated a year ago how to build a quite cost-effective and at the same time impressive AI cluster from several Mac minis via Thunderbolt 5 (Video from c't 3003), Apple has now improved macOS 26 alias Tahoe v, which facilitate the construction of such systems. In practice, this means that even (very) large local AI models can access combined hardware with low latency via Apple's proprietary AI interface MLX and the Exo software from macOS 26.2, which is expected in December.

In a demonstration, which the IT blog Engadget reports on, Apple showed a TB-5 network of a total of four Mac Studio M3 Ultra machines, each equipped with 512 GB and using Exo 1.0 from Exolabs for networking. It was possible to run the over one terabyte large model Kimi K2 Thinking from the Chinese company Moonshot AI. It is considered on par with GPT-5 (almost), even though it is executed locally. The problem: Hardly anyone will own a computer with enough RAM. Apple now offers a way via TB5 to make this possible for at least well-budgeted researchers and professional developers.

Videos by heise

Of course, the fun is still not cheap. A look at the price list shows that one pays 11,674 euros for one of the machines, equipped with 512 GB and the smallest 1 TB SSD. Together, that would be almost 47,000 Euros. The feature built into macOS 26.2 runs the model with one trillion parameters without a giant PC and special graphics cards – apparently at sufficient speed.

According to Engadget, a total of 500 watts were consumed at peak, a tenth of a comparable GPU cluster. In addition to the four Mac Studios, special hardware is not necessary; Apple uses the 80 Gb/s that can be easily transmitted via TB5 and, as mentioned, has internally adjusted the latency for macOS 26.2. Apparently, no hub is needed that would limit the speed. The technology can also be used for the Mac mini M4 Pro, a much cheaper alternative. However, it is only available with a maximum of 64 GB RAM per unit.

Another innovation in macOS 26.2 according to the report: Apple's MLX gains access to the new Neural Accelerators in the M5 chip. However, Apple's latest SoC is currently only available in the MacBook Pro M5, which only has Thunderbolt 4. Macs with M5 Pro and M5 Max are not expected until next spring.

Empfohlener redaktioneller Inhalt

Mit Ihrer Zustimmung wird hier ein externer Preisvergleich (heise Preisvergleich) geladen.

Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (heise Preisvergleich) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.

(bsc)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.