EXO Labs (@exolabs) 's Twitter Profile
EXO Labs

@exolabs

AI on any device.
12 Days of EXO: blog.exolabs.net
We're hiring: exolabs.net

ID: 1772318878934118400

linkhttps://github.com/exo-explore/exo calendar_today25-03-2024 17:45:30

295 Tweet

35,35K Followers

3 Following

Alex Cheema - e/acc (@alexocheema) 's Twitter Profile Photo

Apple's timing could not be better with this. The M3 Ultra 512GB Mac Studio fits perfectly with massive sparse MoEs like DeepSeek V3/R1. 2 M3 Ultra 512GB Mac Studios with EXO Labs is all you need to run the full, unquantized DeepSeek R1 at home. The first requirement for

Apple's timing could not be better with this.

The M3 Ultra 512GB Mac Studio fits perfectly with massive sparse MoEs like DeepSeek V3/R1.

2 M3 Ultra 512GB Mac Studios with <a href="/exolabs/">EXO Labs</a> is all you need to run the full, unquantized DeepSeek R1 at home.

The first requirement for
Alex Cheema - e/acc (@alexocheema) 's Twitter Profile Photo

Apple have given me early access to 2 maxed out M3 Ultra 512GB Mac Studios ahead of the public release. I will run the full DeepSeek R1 (8-bit) using EXO Labs or die trying. The 1TB(!!) of Unified Memory should be enough for all 671B parameters + context.

Apple have given me early access to 2 maxed out M3 Ultra 512GB Mac Studios ahead of the public release.

I will run the full DeepSeek R1 (8-bit) using <a href="/exolabs/">EXO Labs</a> or die trying.

The 1TB(!!) of Unified Memory should be enough for all 671B parameters + context.
EXO Labs (@exolabs) 's Twitter Profile Photo

The full DeepSeek R1 has 671B parameters @ 8-bit = 671GB. 2 x M3 Ultra 512GB Mac Studio with exo (connected with TB5) should be enough to run it with a long context.

Alex Cheema - e/acc (@alexocheema) 's Twitter Profile Photo

Running DeepSeek R1 on my desk Uses EXO Labs with Thunderbolt 5 interconnect (80Gbps) to run the full (671B, 8-bit) DeepSeek R1 distributed across 2 M3 Ultra 512GB Mac Studios (1TB total Unified Memory). Runs at 11 tok/sec. Theoretical max is ~20 tok/sec.

EXO Labs (@exolabs) 's Twitter Profile Photo

exo v2 generalises to any workload expressible in the tiny corp. This includes cryptography, e.g. FHE We published research on 1-bit FHE inference last year that exploits the matmul-free nature of 1-bit models. Overhead is still massive but a promising direction for private AI.

Alex Cheema - e/acc (@alexocheema) 's Twitter Profile Photo

how EXO Labs v2 will enable 128 mac mini clusters (unironically): - search-based orchestration of ML jobs across arbitrary distributed device topologies (bitter lesson) - low interconnect distributed algo's like DiLoCo/SPARTA (10,000x reduction in communication compared to DDP)

how <a href="/exolabs/">EXO Labs</a> v2 will enable 128 mac mini clusters (unironically):

- search-based orchestration of ML jobs across arbitrary distributed device topologies (bitter lesson)
- low interconnect distributed algo's like DiLoCo/SPARTA (10,000x reduction in communication compared to DDP)
Alex Cheema - e/acc (@alexocheema) 's Twitter Profile Photo

Meta 🤝 Apple Llama 4 + Apple Silicon is a match made in heaven. Here's why: Like DeepSeek V3/R1, all of the new Llama 4 variants are massive sparse MoE models. They have a massive amount of parameters, but only a small number of those are active each time a token is generated.

Meta 🤝 Apple

Llama 4 + Apple Silicon is a match made in heaven.

Here's why: Like DeepSeek V3/R1, all of the new Llama 4 variants are massive sparse MoE models. They have a massive amount of parameters, but only a small number of those are active each time a token is generated.
Alex Cheema - e/acc (@alexocheema) 's Twitter Profile Photo

This is 10x easier in the new EXO Labs - one click to install the exo background service + auto discovery + topology-aware heterogeneous device compute plans. MLX is awesome tech. Try running ML in production on other non-NVIDIA hardware and you'll see that most ML software