Zach Mueller (@thezachmueller) 's Twitter Profile
Zach Mueller

@thezachmueller

muellerzr.bsky.social

huggingface.co/muellerzr

ID: 721018777664626688

linkhttps://muellerzr.github.io/ calendar_today15-04-2016 16:54:07

16,16K Tweet

11,11K Followers

410 Following

Zach Mueller (@thezachmueller) 's Twitter Profile Photo

If you want to understand: * How to choose the right setup for distributed training * Understand the costly mistakes that you should avoid * Ensure that you’re up to speed on the latest techniques for training at scale Myself and 14 other speakers will make it happen

If you want to understand:
* How to choose the right setup for distributed training
* Understand the costly mistakes that you should avoid
* Ensure that you’re up to speed on the latest techniques for training at scale

Myself and 14 other speakers will make it happen
tokenbender (@tokenbender) 's Twitter Profile Photo

would add this little nugget overtrained models are hard to SFT overSFTed models are hard to RL overRLed models are hard to course correct with ICL feel free to verify

Teknium (e/Ī») (@teknium1) 's Twitter Profile Photo

Yea and somewhat related sidenote: comparing h1/200s or b200s with 6000 adas or b6000s is crazy because not only are you losing vram, being capped on flops, you are also stuck with PCIe, and on top of that even, no NVLink. Its really a harsh division between their true datacenter

Zach Mueller (@thezachmueller) 's Twitter Profile Photo

A few weeks ago Hugo Bowne-Anderson was kind enough to bring me on and yap about distributed training. It was a wonderful time! The episode is out now on all major platforms: youtube.com/live/76NAtzWZ2…

Wanchao Liang (@wanchao_) 's Twitter Profile Photo

I’ll be presenting TorchTitan: a PyTorch native platform for training foundation models tomorrow at the ICML ES-FoMo@ICML2025 workshop! Come and say Hi!

Zach Mueller (@thezachmueller) 's Twitter Profile Photo

When I left my local club of saltwater aquarium people (Zach lore; this is a real thing) to go start my world in tech, this is what their president said to me as I left. It shook me for a bit, but makes sense looking back

Nathan Lambert (@natolambert) 's Twitter Profile Photo

Not falling for OpenAI’s hype-vague posting about the new IMO gold model with ā€œgeneral purpose RLā€ and whatever else ā€œbreakthrough.ā€ Google also got IMO gold (harder than mastering AIME), but remember, simple ideas scale best.

tenderizzation (@tenderizzation) 's Twitter Profile Photo

launching the training run by manually sshing into each node and starting the script because you cba to figure out how the cluster management software works

Zach Mueller (@thezachmueller) 's Twitter Profile Photo

What currently are people’s solutions when you’ve run out of ChatGPT Deep Research’s for the month? It’s easier for me to get ChatGPT to actually adhere to my directions than Gemini in terms of output format. Just make like 3 accounts so you don’t pay $200/mo?

Zach Mueller (@thezachmueller) 's Twitter Profile Photo

High-level tools likešŸ¤—transformers’ Trainer help you start fast, but long-term they can hurt. In this (free) series, I’ll share: 1. Why these abstractions stall growth 2. How low-level tools like accelerate fix that 3. And how they supercharge your high-level use later

High-level tools likešŸ¤—transformers’ Trainer help you start fast, but long-term they can hurt. In this (free) series, I’ll share:
1. Why these abstractions stall growth
2. How low-level tools like accelerate fix that
3. And how they supercharge your high-level use later