EXO Labs (@exolabs) Twitter Tweets • TwiCopy

EXO Labs

@exolabs

+ Follow

AI on any device.
12 Days of EXO: blog.exolabs.net
We're hiring: exolabs.net

ID: 1772318878934118400

linkhttps://github.com/exo-explore/exo calendar_today25-03-2024 17:45:30

295 Tweet

35,35K Followers

3 Following

Alex Cheema - e/acc

@alexocheema

9 months ago

Apple's timing could not be better with this. The M3 Ultra 512GB Mac Studio fits perfectly with massive sparse MoEs like DeepSeek V3/R1. 2 M3 Ultra 512GB Mac Studios with EXO Labs is all you need to run the full, unquantized DeepSeek R1 at home. The first requirement for

thumb_up_off_alt2,2K

chat_bubble_outline118

repeat327

shareShare

EXO Labs

@exolabs

9 months ago

Who's building the 2T sparse MoE for this? How big will R2 be?

thumb_up_off_alt102

chat_bubble_outline7

repeat6

shareShare

Alex Cheema - e/acc

@alexocheema

8 months ago

Apple have given me early access to 2 maxed out M3 Ultra 512GB Mac Studios ahead of the public release. I will run the full DeepSeek R1 (8-bit) using EXO Labs or die trying. The 1TB(!!) of Unified Memory should be enough for all 671B parameters + context.

Apple have given me early access to 2 maxed out M3 Ultra 512GB Mac Studios ahead of the public release.

I will run the full DeepSeek R1 (8-bit) using <a href="/exolabs/">EXO Labs</a> or die trying.

The 1TB(!!) of Unified Memory should be enough for all 671B parameters + context.

thumb_up_off_alt5,5K

chat_bubble_outline171

repeat298

shareShare

EXO Labs

@exolabs

8 months ago

The full DeepSeek R1 has 671B parameters @ 8-bit = 671GB. 2 x M3 Ultra 512GB Mac Studio with exo (connected with TB5) should be enough to run it with a long context.

thumb_up_off_alt263

chat_bubble_outline7

repeat31

shareShare

EXO Labs

@exolabs

8 months ago

Unboxing the package from Apple.

thumb_up_off_alt484

chat_bubble_outline10

repeat18

shareShare

Alex Cheema - e/acc

@alexocheema

8 months ago

Running DeepSeek R1 on my desk Uses EXO Labs with Thunderbolt 5 interconnect (80Gbps) to run the full (671B, 8-bit) DeepSeek R1 distributed across 2 M3 Ultra 512GB Mac Studios (1TB total Unified Memory). Runs at 11 tok/sec. Theoretical max is ~20 tok/sec.

thumb_up_off_alt7,7K

chat_bubble_outline282

repeat714

shareShare

EXO Labs

@exolabs

8 months ago

exo v2 generalises to any workload expressible in the tiny corp. This includes cryptography, e.g. FHE We published research on 1-bit FHE inference last year that exploits the matmul-free nature of 1-bit models. Overhead is still massive but a promising direction for private AI.

thumb_up_off_alt80

chat_bubble_outline1

repeat11

shareShare

Alex Cheema - e/acc

@alexocheema

8 months ago

how EXO Labs v2 will enable 128 mac mini clusters (unironically): - search-based orchestration of ML jobs across arbitrary distributed device topologies (bitter lesson) - low interconnect distributed algo's like DiLoCo/SPARTA (10,000x reduction in communication compared to DDP)

how <a href="/exolabs/">EXO Labs</a> v2 will enable 128 mac mini clusters (unironically):

- search-based orchestration of ML jobs across arbitrary distributed device topologies (bitter lesson)
- low interconnect distributed algo's like DiLoCo/SPARTA (10,000x reduction in communication compared to DDP)

thumb_up_off_alt940

chat_bubble_outline38

repeat100

shareShare

EXO Labs

@exolabs

8 months ago

hi @levelsio

thumb_up_off_alt159

chat_bubble_outline7

repeat8

shareShare

Alex Cheema - e/acc

@alexocheema

8 months ago

Meta 🤝 Apple Llama 4 + Apple Silicon is a match made in heaven. Here's why: Like DeepSeek V3/R1, all of the new Llama 4 variants are massive sparse MoE models. They have a massive amount of parameters, but only a small number of those are active each time a token is generated.

thumb_up_off_alt2,2K

chat_bubble_outline75

repeat342

shareShare

Alex Cheema - e/acc

@alexocheema

7 months ago

World’s first

thumb_up_off_alt147

chat_bubble_outline3

repeat7

shareShare

Alex Cheema - e/acc

@alexocheema

7 months ago

This is 10x easier in the new EXO Labs - one click to install the exo background service + auto discovery + topology-aware heterogeneous device compute plans. MLX is awesome tech. Try running ML in production on other non-NVIDIA hardware and you'll see that most ML software

thumb_up_off_alt295

chat_bubble_outline3

repeat40

shareShare