Mahesh Sathiamoorthy (@madiator) 's Twitter Profile
Mahesh Sathiamoorthy

@madiator

Post training and Data Curation.
Co-founder @bespokelabsai.
Ex-GoogleDeepMind.

ID: 13614262

linkhttp://smahesh.com calendar_today18-02-2008 08:15:50

3,3K Tweet

12,12K Followers

1,1K Following

Mayee Chen (@mayeechen) 's Twitter Profile Photo

There are many algorithms for constructing pre-training data mixtures—which one should we use? Turns out: many of them fall under one framework, have similar issues, and can be improved with a straightforward modification. Introducing Aioli! 🧄 1/9

There are many algorithms for constructing pre-training data mixtures—which one should we use? Turns out: many of them fall under one framework, have similar issues, and can be improved with a straightforward modification.

Introducing Aioli! 🧄 1/9
Mahesh Sathiamoorthy (@madiator) 's Twitter Profile Photo

India just needs more traffic signals and more people who will follow those signals. Unnecessary slow down because people are going any which way at intersections.

Mahesh Sathiamoorthy (@madiator) 's Twitter Profile Photo

We are organizing a dinner today for researchers in RL and Data. There are a limited number of slots remaining. Please DM me to join.

Qwen (@alibaba_qwen) 's Twitter Profile Photo

🚀 GSPO: Group Sequence Policy Optimization — a breakthrough RL algorithm for scaling LMs! 🔹 Sequence-level optimization — theoretically sound & matching reward 🔹 Rock-solid stability for large MoE models — no collapse 🔹 No hacks like Routing Replay — simpler, cleaner

Mahesh Sathiamoorthy (@madiator) 's Twitter Profile Photo

TIL that Jensen commissioned a chip specially for Carmack, for Quake. And since that chip worked really well, Carmack told his followers to use Nvidia chip for quake, which helped Nvidia a great deal.