Local AI: Nemotron 70B on four Mac minis M4 Pro with Thunderbolt 5 interconnect
Exolabs, which specializes in local AI, shows how you can build a large LLM from four compact Macs. It runs Nemotron 70B – and soon Llama 405B.
Four Mac minis for one LLM: local and freely configurable.
(Image: Screenshot Alex Cheema / X)
Apple's new Mac mini computers with M4 SoC also have some advantages for professional users: they are comparatively inexpensive, very compact and powerful at the same time. This could also help AI applications, such as local large language models (LLMs). A company that specializes in such applications has now demonstrated a high-end setup with a total of four Mac mini machines with M4 Pro, which cooperate via Thunderbolt 5 interconnect. A corresponding video including initial values has now been published on X.
Eight tokens per second, benchmarks to follow
The experiment was conducted by start-up Exo Labs. According to founder Alex Cheema, the small cluster achieves an output of eight tokens per second when using the open-source Nemotron 70B model. Scaling to Llama 405B is possible. According to Cheema, exact benchmark values will be delivered "soon", a preview can be found here. The software from Exo itself can be found on GitHub – it trended on the platform after Cheema's X-Post.
Videos by heise
Throughput rates of 80 Gbps are possible via the Thunderbolt 5 interconnect line. Cheema did not initially provide details on the configuration of the Mac minis. The M4 Pro models are available in Germany with 24 GB RAM and 512 GB SSD from 1649 euros. An SoC with 12 CPU and 16 GPU cores is then integrated; 14 CPU and 20 GPU cores are available for an additional 230 euros. The maximum RAM configuration is 64 GB, here the surcharge is a whopping 690 euros (in each case starting from the aforementioned 1649 euros).
Local AI has numerous advantages, but costs electricity
According to Cheema, a maximum of 30 tokens per second with Nemotron 70B (4-bit quantized) could be possible with such a setup. "We're getting there," he wrote. An upcoming Mac Studio M4 setup with more RAM and M4 Ultra is likely to overtake the cluster in its current state and also consume less power. Cheema himself ultimately acknowledges this.
Local LLMs have various advantages. For example, they are privacy-friendly because no data has to flow to large AI providers (or cloud hosts), and you can configure the language model as you wish.
Empfohlener redaktioneller Inhalt
Mit Ihrer Zustimmung wird hier ein externer Preisvergleich (heise Preisvergleich) geladen.
Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (heise Preisvergleich) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.
(bsc)