Nicolas Espinosa Dice (@nico_espinosa_d) Twitter Tweets • TwiCopy

Nicolas Espinosa Dice

@nico_espinosa_d

+ Follow

cs phd student @Cornell. previously @HarveyMudd. working on reinforcement learning & generative models

ID: 1747025370958692352

linkhttp://nico-espinosadice.github.io calendar_today15-01-2024 22:38:06

4 Tweet

49 Followers

205 Following

Nived Rajaraman

@nived_rajaraman

4 months ago

Announcing the first workshop on Foundations of Post-Training (FoPT) at COLT 2025! 📝 Soliciting abstracts/posters exploring theoretical & practical aspects of post-training and RL with language models! │ 🗓️ Deadline: May 19, 2025

thumb_up_off_alt74

chat_bubble_outline1

repeat24

shareShare

Nicolas Espinosa Dice

@nico_espinosa_d

4 months ago

why agents must be robust to misspecification...

thumb_up_off_alt2

chat_bubble_outline1

repeat0

shareShare

Gokul Swamy

@g_k_swamy

3 months ago

Say ahoy to 𝚂𝙰𝙸𝙻𝙾𝚁⛵: a new paradigm of *learning to search* from demonstrations, enabling test-time reasoning about how to recover from mistakes w/o any additional human feedback! 𝚂𝙰𝙸𝙻𝙾𝚁 ⛵ out-performs Diffusion Policies trained via behavioral cloning on 5-10x data!

thumb_up_off_alt247

chat_bubble_outline10

repeat64

shareShare

Gokul Swamy

@g_k_swamy

3 months ago

Shortcut models enable scaling offline RL, both at train-time at test-time! We beat so many other algorithms on so many tasks we had to stick most of the results in the appendix 😅. Very proud of Nicolas Espinosa Dice for spearheading this project, check out his thread!

thumb_up_off_alt14

chat_bubble_outline0

repeat5

shareShare

Wen Sun

@wensun1

3 months ago

A simple and efficient approach to RL for generative policies! Prior work typically requires massively extending the RL horizon or performing some kind of importance weighting followed by flow or score matching. By deploying a shortcut model, our SORL enables efficient training

thumb_up_off_alt31

chat_bubble_outline1

repeat5

shareShare

Owen Oertell

@owenoertell

3 months ago

Tired of over-optimized generations that stray too far from the base distribution? We present SLCD: Supervised Learning based Controllable Diffusion, which (provably) solves the KL constrained reward maximization problem for diffusion through supervised learning! (1/n)

thumb_up_off_alt27

chat_bubble_outline2

repeat10

shareShare

Gokul Swamy

@g_k_swamy

3 months ago

It was a dream come true to teach the course I wish existed at the start of my PhD. We built up the algorithmic foundations of modern-day RL, imitation learning, and RLHF, going deeper than the usual "grab bag of tricks". All 25 lectures + 150 pages of notes are now public! 🧵

thumb_up_off_alt691

chat_bubble_outline7

repeat87

shareShare

Gokul Swamy

@g_k_swamy

2 months ago

Check out Nicolas Espinosa Dice's blog post on how we can enable test-time scaling of policies learned via offline RL! I am particularly impressed by the figures :).

thumb_up_off_alt9

chat_bubble_outline0

repeat1

shareShare

Kaiwen Wang

@kaiwenw_ai

2 months ago

I’m presenting two papers on value-based RL for post-training & reasoning on Friday at AI for Math Workshop @ ICML 2025 at #ICML2025! 1️⃣ Q#: lays theoretical foundations for value-based RL for post-training LMs; 2️⃣ VGS: practical value-guided search scaled up for long CoT reasoning. 🧵👇

thumb_up_off_alt47

chat_bubble_outline2

repeat15

shareShare

Wen Sun

@wensun1

2 months ago

How can small LLMs match or even surpass frontier models like DeepSeek R1 and o3 Mini in math competition (AIME & HMMT) reasoning? Prior work seems to suggest that ideas like PRMs do not really work or scale well for long context reasoning. Kaiwen Wang will reveal how a novel

thumb_up_off_alt23

chat_bubble_outline0

repeat8

shareShare

Nicolas Espinosa Dice

@nico_espinosa_d

2 months ago

excited to share our blog post on how to scale offline RL at test-time!

thumb_up_off_alt9

chat_bubble_outline0

repeat1

shareShare