Zach Mueller (@thezachmueller) Twitter Tweets • TwiCopy

Zach Mueller

@thezachmueller

+ Follow

muellerzr.bsky.social

huggingface.co/muellerzr

ID: 721018777664626688

linkhttps://muellerzr.github.io/ calendar_today15-04-2016 16:54:07

16,16K Tweet

11,11K Followers

410 Following

Zach Mueller

@thezachmueller

3 months ago

If you want to understand: * How to choose the right setup for distributed training * Understand the costly mistakes that you should avoid * Ensure that you’re up to speed on the latest techniques for training at scale Myself and 14 other speakers will make it happen

thumb_up_off_alt21

chat_bubble_outline1

repeat2

shareShare

Zach Mueller

@thezachmueller

3 months ago

Sometimes when you market, the original is just better

thumb_up_off_alt6

chat_bubble_outline1

repeat0

shareShare

tokenbender

@tokenbender

3 months ago

would add this little nugget overtrained models are hard to SFT overSFTed models are hard to RL overRLed models are hard to course correct with ICL feel free to verify

thumb_up_off_alt20

chat_bubble_outline3

repeat1

shareShare

Teknium (e/λ)

@teknium1

3 months ago

Yea and somewhat related sidenote: comparing h1/200s or b200s with 6000 adas or b6000s is crazy because not only are you losing vram, being capped on flops, you are also stuck with PCIe, and on top of that even, no NVLink. Its really a harsh division between their true datacenter

thumb_up_off_alt49

chat_bubble_outline2

repeat2

shareShare

Zach Mueller

@thezachmueller

3 months ago

A few weeks ago Hugo Bowne-Anderson was kind enough to bring me on and yap about distributed training. It was a wonderful time! The episode is out now on all major platforms: youtube.com/live/76NAtzWZ2…

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Wanchao Liang

@wanchao_

3 months ago

I’ll be presenting TorchTitan: a PyTorch native platform for training foundation models tomorrow at the ICML ES-FoMo@ICML2025 workshop! Come and say Hi!

thumb_up_off_alt85

chat_bubble_outline2

repeat14

shareShare

Zach Mueller

@thezachmueller

3 months ago

When I left my local club of saltwater aquarium people (Zach lore; this is a real thing) to go start my world in tech, this is what their president said to me as I left. It shook me for a bit, but makes sense looking back

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Zach Mueller

@thezachmueller

3 months ago

Merve giving us her secret sauce 👀

thumb_up_off_alt14

chat_bubble_outline1

repeat0

shareShare

Zach Mueller

@thezachmueller

3 months ago

There won’t be any lighting lessons this week. Cooking up something special, stay tuned

thumb_up_off_alt2

chat_bubble_outline1

repeat0

shareShare

Zach Mueller

@thezachmueller

3 months ago

The 🐐 s:

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Nathan Lambert

@natolambert

3 months ago

Not falling for OpenAI’s hype-vague posting about the new IMO gold model with “general purpose RL” and whatever else “breakthrough.” Google also got IMO gold (harder than mastering AIME), but remember, simple ideas scale best.

thumb_up_off_alt835

chat_bubble_outline29

repeat37

shareShare

tenderizzation

@tenderizzation

3 months ago

launching the training run by manually sshing into each node and starting the script because you cba to figure out how the cluster management software works

thumb_up_off_alt417

chat_bubble_outline8

repeat16

shareShare

Zach Mueller

@thezachmueller

3 months ago

What currently are people’s solutions when you’ve run out of ChatGPT Deep Research’s for the month? It’s easier for me to get ChatGPT to actually adhere to my directions than Gemini in terms of output format. Just make like 3 accounts so you don’t pay $200/mo?

thumb_up_off_alt2

chat_bubble_outline2

repeat0

shareShare

Zach Mueller

@thezachmueller

3 months ago

High-level tools like🤗transformers’ Trainer help you start fast, but long-term they can hurt. In this (free) series, I’ll share: 1. Why these abstractions stall growth 2. How low-level tools like accelerate fix that 3. And how they supercharge your high-level use later

thumb_up_off_alt19

chat_bubble_outline3

repeat3

shareShare