Kanishk Gandhi (@gandhikanishk) 's Twitter Profile
Kanishk Gandhi

@gandhikanishk

Phd CS@Stanford @StanfordNLP, Computation and Cognition; w/ Noah Goodman | Prev: @LakeBrenden @NYUDataScience, @IITKanpur, @Path_AI

ID: 489550720

calendar_today11-02-2012 17:00:25

401 Tweet

1,1K Followers

871 Following

Kunhao Zheng @ ICLR 2025 (@kunhaoz) 's Twitter Profile Photo

๐Ÿšจ Your RL only improves ๐—ฝ๐—ฎ๐˜€๐˜€@๐Ÿญ, not ๐—ฝ๐—ฎ๐˜€๐˜€@๐—ธ? ๐Ÿšจ Thatโ€™s not a bug โ€” itโ€™s a ๐—ณ๐—ฒ๐—ฎ๐˜๐˜‚๐—ฟ๐—ฒ ๐—ผ๐—ณ ๐˜๐—ต๐—ฒ ๐—ผ๐—ฏ๐—ท๐—ฒ๐—ฐ๐˜๐—ถ๐˜ƒ๐—ฒ youโ€™re optimizing. You get what you optimize for. If you want better pass@k, you need to optimize for pass@k at training time. ๐Ÿงต How?

๐Ÿšจ Your RL only improves ๐—ฝ๐—ฎ๐˜€๐˜€@๐Ÿญ, not ๐—ฝ๐—ฎ๐˜€๐˜€@๐—ธ? ๐Ÿšจ

Thatโ€™s not a bug โ€” itโ€™s a ๐—ณ๐—ฒ๐—ฎ๐˜๐˜‚๐—ฟ๐—ฒ ๐—ผ๐—ณ ๐˜๐—ต๐—ฒ ๐—ผ๐—ฏ๐—ท๐—ฒ๐—ฐ๐˜๐—ถ๐˜ƒ๐—ฒ youโ€™re optimizing.

You get what you optimize for. If you want better pass@k, you need to optimize for pass@k at training time.

๐Ÿงต How?
Andrew Lampinen (@andrewlampinen) 's Twitter Profile Photo

How do language models generalize from information they learn in-context vs. via finetuning? We show that in-context learning can generalize more flexibly, illustrating key differences in the inductive biases of these modes of learning โ€” and ways to improve finetuning. Thread: 1/

How do language models generalize from information they learn in-context vs. via finetuning? We show that in-context learning can generalize more flexibly, illustrating key differences in the inductive biases of these modes of learning โ€” and ways to improve finetuning. Thread: 1/
Shashwat Goel (@shashwatgoel7) 's Twitter Profile Photo

Confused about recent LLM RL results where models improve without any ground-truth signal? We were too. Until we looked at the reported numbers of the Pre-RL models and realized they were serverely underreported across papers. We compiled discrepancies in a blog below๐Ÿงต๐Ÿ‘‡

Confused about recent LLM RL results where models improve without any ground-truth signal? We were too. Until we looked at the reported numbers of the Pre-RL models and realized they were serverely underreported across papers. We compiled discrepancies in a blog below๐Ÿงต๐Ÿ‘‡
Omar Khattab (@lateinteraction) 's Twitter Profile Photo

Sigh, it's a bit of a mess. Let me just give you guys the full nuance in one stream of consciousness since I think we'll continue to get partial interpretations that confuse everyone. All the little things I post need to always be put together in one place. First, I have long

Omar Shaikh (@oshaikh13) 's Twitter Profile Photo

What if LLMs could learn your habits and preferences well enough (across any context!) to anticipate your needs? In a new paper, we present the General User Model (GUM): a model of you built from just your everyday computer use. ๐Ÿงต

Andreas Kirsch ๐Ÿ‡บ๐Ÿ‡ฆ (@blackhc) 's Twitter Profile Photo

I'm late to review the "Illusion of Thinking" paper, so let me collect some of the best threads by and critical takes by Lisan al Gaib in one place and sprinkle some of my own thoughts in as well. The paper is rather critical of reasoning LLMs (LRMs): x.com/MFarajtabar/stโ€ฆ

Luiz Pessoa (@pessoabrain) 's Twitter Profile Photo

I wish people would stop sharing this article without evaluating it. One might not like AI but that doesn't make a paper critical of it of value because of that. That's not how science works.