Nicolas Espinosa Dice (@nico_espinosa_d) 's Twitter Profile
Nicolas Espinosa Dice

@nico_espinosa_d

cs phd student @Cornell. previously @HarveyMudd. working on reinforcement learning & generative models

ID: 1747025370958692352

linkhttp://nico-espinosadice.github.io calendar_today15-01-2024 22:38:06

4 Tweet

49 Followers

205 Following

Nived Rajaraman (@nived_rajaraman) 's Twitter Profile Photo

Announcing the first workshop on Foundations of Post-Training (FoPT) at COLT 2025! 📝 Soliciting abstracts/posters exploring theoretical & practical aspects of post-training and RL with language models! │ 🗓️ Deadline: May 19, 2025

Announcing the first workshop on Foundations of Post-Training (FoPT) at COLT 2025!

📝 Soliciting abstracts/posters exploring theoretical & practical aspects of post-training and RL with language models!
│
🗓️ Deadline: May 19, 2025
Gokul Swamy (@g_k_swamy) 's Twitter Profile Photo

Say ahoy to 𝚂𝙰𝙸𝙻𝙾𝚁⛵: a new paradigm of *learning to search* from demonstrations, enabling test-time reasoning about how to recover from mistakes w/o any additional human feedback! 𝚂𝙰𝙸𝙻𝙾𝚁 ⛵ out-performs Diffusion Policies trained via behavioral cloning on 5-10x data!

Gokul Swamy (@g_k_swamy) 's Twitter Profile Photo

Shortcut models enable scaling offline RL, both at train-time at test-time! We beat so many other algorithms on so many tasks we had to stick most of the results in the appendix 😅. Very proud of Nicolas Espinosa Dice for spearheading this project, check out his thread!

Wen Sun (@wensun1) 's Twitter Profile Photo

A simple and efficient approach to RL for generative policies! Prior work typically requires massively extending the RL horizon or performing some kind of importance weighting followed by flow or score matching. By deploying a shortcut model, our SORL enables efficient training

Owen Oertell (@owenoertell) 's Twitter Profile Photo

Tired of over-optimized generations that stray too far from the base distribution? We present SLCD: Supervised Learning based Controllable Diffusion, which (provably) solves the KL constrained reward maximization problem for diffusion through supervised learning! (1/n)

Tired of over-optimized generations that stray too far from the base distribution?
We present SLCD: Supervised Learning based Controllable Diffusion, which (provably) solves the KL constrained reward maximization problem for diffusion through supervised learning! (1/n)
Gokul Swamy (@g_k_swamy) 's Twitter Profile Photo

It was a dream come true to teach the course I wish existed at the start of my PhD. We built up the algorithmic foundations of modern-day RL, imitation learning, and RLHF, going deeper than the usual "grab bag of tricks". All 25 lectures + 150 pages of notes are now public! 🧵

It was a dream come true to teach the course I wish existed at the start of my PhD. We built up the algorithmic foundations of modern-day RL, imitation learning, and RLHF, going deeper than the usual "grab bag of tricks". All 25 lectures + 150 pages of notes are now public! 🧵
Gokul Swamy (@g_k_swamy) 's Twitter Profile Photo

Check out Nicolas Espinosa Dice's blog post on how we can enable test-time scaling of policies learned via offline RL! I am particularly impressed by the figures :).

Kaiwen Wang (@kaiwenw_ai) 's Twitter Profile Photo

I’m presenting two papers on value-based RL for post-training & reasoning on Friday at AI for Math Workshop @ ICML 2025 at #ICML2025! 1️⃣ Q#: lays theoretical foundations for value-based RL for post-training LMs; 2️⃣ VGS: practical value-guided search scaled up for long CoT reasoning. 🧵👇

Wen Sun (@wensun1) 's Twitter Profile Photo

How can small LLMs match or even surpass frontier models like DeepSeek R1 and o3 Mini in math competition (AIME & HMMT) reasoning? Prior work seems to suggest that ideas like PRMs do not really work or scale well for long context reasoning. Kaiwen Wang will reveal how a novel