
Kanishk Gandhi
@gandhikanishk
Phd CS@Stanford @StanfordNLP, Computation and Cognition; w/ Noah Goodman | Prev: @LakeBrenden @NYUDataScience, @IITKanpur, @Path_AI
ID: 489550720
11-02-2012 17:00:25
401 Tweet
1,1K Followers
871 Following


๐จ Your RL only improves ๐ฝ๐ฎ๐๐@๐ญ, not ๐ฝ๐ฎ๐๐@๐ธ? ๐จ Thatโs not a bug โ itโs a ๐ณ๐ฒ๐ฎ๐๐๐ฟ๐ฒ ๐ผ๐ณ ๐๐ต๐ฒ ๐ผ๐ฏ๐ท๐ฒ๐ฐ๐๐ถ๐๐ฒ youโre optimizing. You get what you optimize for. If you want better pass@k, you need to optimize for pass@k at training time. ๐งต How?









I'm late to review the "Illusion of Thinking" paper, so let me collect some of the best threads by and critical takes by Lisan al Gaib in one place and sprinkle some of my own thoughts in as well. The paper is rather critical of reasoning LLMs (LRMs): x.com/MFarajtabar/stโฆ

