Sham Kakade (@shamkakade6) 's Twitter Profile
Sham Kakade

@shamkakade6

Harvard Professor.
Full stack ML and AI.
Co-director of the Kempner Institute for the Study of Artificial and Natural Intelligence.

ID: 1069106254927273984

linkhttps://shamulent.github.io calendar_today02-12-2018 05:49:28

418 Tweet

14,14K Followers

463 Following

Rosie Zhao (@rosieyzh) 's Twitter Profile Photo

Excited to be attending 🇸🇬#ICLR2025! My DMs are open, please reach out to chat about LLM reasoning/optimization/training dynamics! Will be presenting a study on diagonal preconditioning optimizers for LLM pretraining (arxiv.org/abs/2407.07972) and SOAP (arxiv.org/abs/2409.11321)

Excited to be attending 🇸🇬#ICLR2025! My DMs are open, please reach out to chat about LLM reasoning/optimization/training dynamics!

Will be presenting a study on diagonal preconditioning optimizers for LLM pretraining (arxiv.org/abs/2407.07972) and SOAP (arxiv.org/abs/2409.11321)
Depen Morwani (@depen_morwani) 's Twitter Profile Photo

Excited to attend #ICLR25 this week. My DMs are open, feel free to drop a message to talk about anything related to optimization of deep networks. Presenting multiple works related to second order optimization, critical batch size and diagonal preconditioning. Details below.

Kempner Institute at Harvard University (@kempnerinst) 's Twitter Profile Photo

New in the Deeper Learning blog: Kempner researchers show how VLMs speak the same semantic language across images and text. bit.ly/KempnerVLM by Isabel Papadimitriou, Chloe H. Su, Thomas Fel, Stephanie Gil, and Sham Kakade #AI #ML #VLMs #SAEs

Antonio Orvieto (@orvieto_antonio) 's Twitter Profile Photo

We have a new SSM theory paper, just accepted to COLT, revisiting recall properties of linear RNNs. It's surprising how much one can delve into, and how beautiful it can become. With (and only thanks to) the amazing Alexandre and Francis Bach arxiv.org/pdf/2502.09287

We have a new SSM theory paper, just accepted to COLT, revisiting recall properties of linear RNNs. 

It's surprising how much one can delve into, and how beautiful it can become.

With (and only thanks to) the amazing Alexandre and <a href="/BachFrancis/">Francis Bach</a> 

arxiv.org/pdf/2502.09287
Kempner Institute at Harvard University (@kempnerinst) 's Twitter Profile Photo

NEW: Yilun Du of Google DeepMind & incoming #KempnerInstitute faculty explains how optimizing energy functions can help solve challenging navigation & reasoning problems. Watch the talk: youtube.com/watch?v=UKbLBO… #NeuroAI2025 #ML #neuroscience #NeuroAI

Aayush Karan (@aakaran31) 's Twitter Profile Photo

Steering diffusion models with external rewards has recently led to exciting results, but what happens when the reward is inherently difficult? Introducing ReGuidance: a simple algorithm to (provably!) boost your favorite guidance method on hard problems! 🚀🚀🚀 A thread: (1/n)

Steering diffusion models with external rewards has recently led to exciting results, but what happens when the reward is inherently difficult?

Introducing ReGuidance: a simple algorithm to (provably!) boost your favorite guidance method on hard problems! 🚀🚀🚀

A thread: (1/n)
Hanlin Zhang (@_hanlin_zhang_) 's Twitter Profile Photo

[1/n] New work [JSKZ25] w/ Jikai Jin, Vasilis Syrgkanis, Sham Kakade. We introduce new formulations and tools for evaluating language model capabilities, which help explain recent observations of post-training behaviors of Qwen-series models — there is a sensitive causal link

[1/n] New work [JSKZ25] w/ <a href="/JikaiJin2002/">Jikai Jin</a>, <a href="/syrgkanis/">Vasilis Syrgkanis</a>, <a href="/ShamKakade6/">Sham Kakade</a>. 

We introduce new formulations and tools for evaluating language model capabilities, which help explain recent observations of post-training behaviors of Qwen-series models — there is a sensitive causal link
Chloe H. Su (@huangyu58589918) 's Twitter Profile Photo

What precision should we use to train large AI models effectively? Our latest research probes the subtle nature of training instabilities under low precision formats like MXFP8 and ways to mitigate them. Thread 🧵👇

What precision should we use to train large AI models effectively? Our latest research probes the subtle nature of training instabilities under low precision formats like MXFP8 and ways to mitigate them. Thread 🧵👇
Sham Kakade (@shamkakade6) 's Twitter Profile Photo

I took this class. Good times! Thank you Zoubin Ghahramani and Geoffrey Hinton!! Yeah, no backprop. I view this as more the “modeling phase” of deep learning vs “scale”. I’m going with the ideas are still relevant for AI4science.

Hanlin Zhang (@_hanlin_zhang_) 's Twitter Profile Photo

[1/n] Discussions about LM reasoning and post-training have gained momentum. We identify several missing pieces: ✏️Post-training based on off-the-shelf base models without transparent pre-training data components and scale. ✏️Intermediate checkpoints with incomplete learning

Gabriel Poesia (@gabrielpoesia) 's Twitter Profile Photo

Thrilled to join the UMich faculty in 2026! I'll also be recruiting PhD students this upcoming cycle. If you're interested in AI and formal reasoning, consider applying!

Noah Golowich (@golowichnoah) 's Twitter Profile Photo

I'll be attending ICML this week; come stop by our poster on length generalization in LLMs on Tuesday morning (poster session 1 west)! Paper link: openreview.net/forum?id=S9LkB…

Kempner Institute at Harvard University (@kempnerinst) 's Twitter Profile Photo

A team from #KempnerInstitute, Harvard SEAS & Computer Science at UT Austin has won a best paper award at #ICML2025 for work unlocking the potential of masked diffusion models. Congrats to Jaeyeon (Jay) Kim @ICML, Kulin Shah, Vasilis Kontonis, Sham Kakade and Sitan Chen. kempnerinstitute.harvard.edu/news/kempner-i… #AI