
Nicolas Espinosa Dice
@nico_espinosa_d
cs phd student @Cornell. previously @HarveyMudd. working on reinforcement learning & generative models
ID: 1747025370958692352
http://nico-espinosadice.github.io 15-01-2024 22:38:06
4 Tweet
49 Followers
205 Following




Shortcut models enable scaling offline RL, both at train-time at test-time! We beat so many other algorithms on so many tasks we had to stick most of the results in the appendix 😅. Very proud of Nicolas Espinosa Dice for spearheading this project, check out his thread!




Check out Nicolas Espinosa Dice's blog post on how we can enable test-time scaling of policies learned via offline RL! I am particularly impressed by the figures :).

I’m presenting two papers on value-based RL for post-training & reasoning on Friday at AI for Math Workshop @ ICML 2025 at #ICML2025! 1️⃣ Q#: lays theoretical foundations for value-based RL for post-training LMs; 2️⃣ VGS: practical value-guided search scaled up for long CoT reasoning. 🧵👇

How can small LLMs match or even surpass frontier models like DeepSeek R1 and o3 Mini in math competition (AIME & HMMT) reasoning? Prior work seems to suggest that ideas like PRMs do not really work or scale well for long context reasoning. Kaiwen Wang will reveal how a novel
