Sham Kakade (@shamkakade6) Twitter Tweets • TwiCopy

Sham Kakade

@shamkakade6

+ Follow

Harvard Professor.
Full stack ML and AI.
Co-director of the Kempner Institute for the Study of Artificial and Natural Intelligence.

ID: 1069106254927273984

linkhttps://shamulent.github.io calendar_today02-12-2018 05:49:28

418 Tweet

14,14K Followers

463 Following

Rosie Zhao

@rosieyzh

5 months ago

Excited to be attending 🇸🇬#ICLR2025! My DMs are open, please reach out to chat about LLM reasoning/optimization/training dynamics! Will be presenting a study on diagonal preconditioning optimizers for LLM pretraining (arxiv.org/abs/2407.07972) and SOAP (arxiv.org/abs/2409.11321)

thumb_up_off_alt182

chat_bubble_outline3

repeat12

shareShare

Depen Morwani

@depen_morwani

5 months ago

Excited to attend #ICLR25 this week. My DMs are open, feel free to drop a message to talk about anything related to optimization of deep networks. Presenting multiple works related to second order optimization, critical batch size and diagonal preconditioning. Details below.

thumb_up_off_alt24

chat_bubble_outline1

repeat5

shareShare

Kempner Institute at Harvard University

@kempnerinst

4 months ago

New in the Deeper Learning blog: Kempner researchers show how VLMs speak the same semantic language across images and text. bit.ly/KempnerVLM by Isabel Papadimitriou, Chloe H. Su, Thomas Fel, Stephanie Gil, and Sham Kakade #AI #ML #VLMs #SAEs

thumb_up_off_alt23

chat_bubble_outline0

repeat17

shareShare

Antonio Orvieto

@orvieto_antonio

3 months ago

We have a new SSM theory paper, just accepted to COLT, revisiting recall properties of linear RNNs. It's surprising how much one can delve into, and how beautiful it can become. With (and only thanks to) the amazing Alexandre and Francis Bach arxiv.org/pdf/2502.09287

thumb_up_off_alt100

chat_bubble_outline2

repeat26

shareShare

Kempner Institute at Harvard University

@kempnerinst

3 months ago

NEW: Yilun Du of Google DeepMind & incoming #KempnerInstitute faculty explains how optimizing energy functions can help solve challenging navigation & reasoning problems. Watch the talk: youtube.com/watch?v=UKbLBO… #NeuroAI2025 #ML #neuroscience #NeuroAI

thumb_up_off_alt43

chat_bubble_outline0

repeat4

shareShare

Aayush Karan

@aakaran31

3 months ago

Steering diffusion models with external rewards has recently led to exciting results, but what happens when the reward is inherently difficult? Introducing ReGuidance: a simple algorithm to (provably!) boost your favorite guidance method on hard problems! 🚀🚀🚀 A thread: (1/n)

thumb_up_off_alt161

chat_bubble_outline7

repeat27

shareShare

Hanlin Zhang

@_hanlin_zhang_

3 months ago

[1/n] New work [JSKZ25] w/ Jikai Jin, Vasilis Syrgkanis, Sham Kakade. We introduce new formulations and tools for evaluating language model capabilities, which help explain recent observations of post-training behaviors of Qwen-series models — there is a sensitive causal link

[1/n] New work [JSKZ25] w/ <a href="/JikaiJin2002/">Jikai Jin</a>, <a href="/syrgkanis/">Vasilis Syrgkanis</a>, <a href="/ShamKakade6/">Sham Kakade</a>.

We introduce new formulations and tools for evaluating language model capabilities, which help explain recent observations of post-training behaviors of Qwen-series models — there is a sensitive causal link

thumb_up_off_alt128

chat_bubble_outline1

repeat8

shareShare

Chloe H. Su

@huangyu58589918

2 months ago

What precision should we use to train large AI models effectively? Our latest research probes the subtle nature of training instabilities under low precision formats like MXFP8 and ways to mitigate them. Thread 🧵👇

thumb_up_off_alt58

chat_bubble_outline10

repeat17

shareShare

Sham Kakade

@shamkakade6

2 months ago

I took this class. Good times! Thank you Zoubin Ghahramani and Geoffrey Hinton!! Yeah, no backprop. I view this as more the “modeling phase” of deep learning vs “scale”. I’m going with the ideas are still relevant for AI4science.

thumb_up_off_alt92

chat_bubble_outline1

repeat4

shareShare

Hanlin Zhang

@_hanlin_zhang_

2 months ago

[1/n] Discussions about LM reasoning and post-training have gained momentum. We identify several missing pieces: ✏️Post-training based on off-the-shelf base models without transparent pre-training data components and scale. ✏️Intermediate checkpoints with incomplete learning

thumb_up_off_alt238

chat_bubble_outline1

repeat13

shareShare

Gabriel Poesia

@gabrielpoesia

2 months ago

Thrilled to join the UMich faculty in 2026! I'll also be recruiting PhD students this upcoming cycle. If you're interested in AI and formal reasoning, consider applying!

thumb_up_off_alt272

chat_bubble_outline31

repeat27

shareShare

Kulin Shah

@shahkulin98

2 months ago

Thrilled to share that our work received the Outstanding Paper Award at ICML! I will be giving the oral presentation on Tuesday at 4:15 PM. Jaeyeon (Jay) Kim @ICML and I both will be at the poster session shortly after the oral presentation. Please attend if possible!

thumb_up_off_alt121

chat_bubble_outline4

repeat14

shareShare

Noah Golowich

@golowichnoah

2 months ago

I'll be attending ICML this week; come stop by our poster on length generalization in LLMs on Tuesday morning (poster session 1 west)! Paper link: openreview.net/forum?id=S9LkB…

thumb_up_off_alt26

chat_bubble_outline0

repeat4

shareShare

Kempner Institute at Harvard University

@kempnerinst

2 months ago

A team from #KempnerInstitute, Harvard SEAS & Computer Science at UT Austin has won a best paper award at #ICML2025 for work unlocking the potential of masked diffusion models. Congrats to Jaeyeon (Jay) Kim @ICML, Kulin Shah, Vasilis Kontonis, Sham Kakade and Sitan Chen. kempnerinstitute.harvard.edu/news/kempner-i… #AI

thumb_up_off_alt34

chat_bubble_outline0

repeat7

shareShare