Wen Sun (@wensun1) 's Twitter Profile
Wen Sun

@wensun1

Assistant Professor at Cornell CS. Machine Learning and Reinforcement Learning; check out the RL Algorithm and theory book here rltheorybook.github.io

ID: 824609918

linkhttps://wensun.github.io calendar_today15-09-2012 05:27:25

62 Tweet

548 Followers

59 Following

Gokul Swamy (@g_k_swamy) 's Twitter Profile Photo

3/3 for #ICLR2025! Huge congratulations to lead authors (Nicolas Espinosa-Dice, Zhaolin Gao, and Runzhe Wu). I'll save a more in depth discussion of the papers for later but if you'd like a sneak peak, check out arxiv.org/abs/2410.13855 / arxiv.org/abs/2410.04612)!

Wen Sun (@wensun1) 's Twitter Profile Photo

Extremely honored to receive this award. Credit goes to my collaborators, mentors, and especially my amazing students! #SloanFellow

Gokul Swamy (@g_k_swamy) 's Twitter Profile Photo

1.5 yrs ago, we set out to answer a seemingly simple question: what are we *actually* getting out of RL in fine-tuning? I'm thrilled to share a pearl we found on the deepest dive of my PhD: the value of RL in RLHF seems to come from *generation-verification gaps*. Get ready to🤿!

1.5 yrs ago, we set out to answer a seemingly simple question: what are we *actually* getting out of RL in fine-tuning? I'm thrilled to share a pearl we found on the deepest dive of my PhD: the value of RL in RLHF seems to come from *generation-verification gaps*. Get ready to🤿!
Gokul Swamy (@g_k_swamy) 's Twitter Profile Photo

I think of misspecification (embodiment / sensory gaps) as the fundamental reason behavioral cloning isn't "all you need" for imitation as matching actions != matching outcomes. Introducing Nicolas Espinosa Dice's #ICLR2025 paper proving that "local search" *is* all you need! [1/n]

I think of misspecification (embodiment / sensory gaps) as the fundamental reason behavioral cloning isn't "all you need" for imitation as matching actions != matching outcomes. Introducing <a href="/nico_espinosa_d/">Nicolas Espinosa Dice</a>'s #ICLR2025 paper proving that "local search" *is* all you need! [1/n]
Gokul Swamy (@g_k_swamy) 's Twitter Profile Photo

I won't be at #ICLR2025 myself this time around but please go talk to lead authors Nicolas Espinosa-Dice, Zhaolin Gao, and Runzhe Wu about their bleeding-edge algorithms for imitation learning and RLHF!

I won't be at #ICLR2025 myself this time around but please go talk to lead authors <a href="/nico_espinosa_d/">Nicolas Espinosa-Dice</a>, <a href="/GaoZhaolin/">Zhaolin Gao</a>, and <a href="/runzhe_wu/">Runzhe Wu</a> about their bleeding-edge algorithms for imitation learning and RLHF!
Runzhe Wu @ICLR2025 (@runzhe_wu) 's Twitter Profile Photo

#ICLR2025 Oral 🚨 Provably efficient RL has advanced significantly but it's still unclear if efficient algos exist for the simple setting of "Linear Bellman Completeness" We solve for the special case of deterministic state transitions using an approach we call "span argument"!🧵

#ICLR2025 Oral 🚨 Provably efficient RL has advanced significantly but it's still unclear if efficient algos exist for the simple setting of "Linear Bellman Completeness" We solve for the special case of deterministic state transitions using an approach we call "span argument"!🧵
ML@CMU (@mlcmublog) 's Twitter Profile Photo

blog.ml.cmu.edu/2025/06/01/rlh… In this in-depth coding tutorial, Zhaolin Gao and Gokul Swamy walk through the steps to train an LLM via RL from Human Feedback!

Wen Sun (@wensun1) 's Twitter Profile Photo

A simple and efficient approach to RL for generative policies! Prior work typically requires massively extending the RL horizon or performing some kind of importance weighting followed by flow or score matching. By deploying a shortcut model, our SORL enables efficient training

Wen Sun (@wensun1) 's Twitter Profile Photo

Instead of formalizing reward-guided fine-tuning diffusion models as (discrete or even continuous) MDPs and then using RL or control to optimize them (just way too complicated), simple interactive online learning with classification oracles is sufficient to achieve strong results

Kempner Institute at Harvard University (@kempnerinst) 's Twitter Profile Photo

New in the #DeeperLearningBlog: Researchers from the #KempnerInstitute, Cornell University and CMU Robotics Institute introduce a new method for improving offline RL by scaling-up test-time compute. kempnerinstitute.harvard.edu/research/deepe… #AI #RL (1/2)

Wen Sun (@wensun1) 's Twitter Profile Photo

Does RL actually learn positively under random rewards when optimizing Qwen on MATH? Is Qwen really that magical such that even RLing on random rewards can make it reason better? Following prior work on spurious rewards on RL, we ablated algorithms. It turns out that if you

Wen Sun (@wensun1) 's Twitter Profile Photo

How can small LLMs match or even surpass frontier models like DeepSeek R1 and o3 Mini in math competition (AIME & HMMT) reasoning? Prior work seems to suggest that ideas like PRMs do not really work or scale well for long context reasoning. Kaiwen Wang will reveal how a novel

Kempner Institute at Harvard University (@kempnerinst) 's Twitter Profile Photo

New in the #DeeperLearningBlog: Zhaolin Gao and collaborators including the #KempnerInstitute's Kianté Brantley presents a powerful new #RL algorithm tailored for reasoning tasks with #LLMs that updates using only one generation per prompt. bit.ly/44US1Mt Kianté Brantley (Hiring Fall25 Postdoc and PhDs) #AI

Dipendra Misra (@dipendramisra) 's Twitter Profile Photo

At Databricks , we are using RLVR to solve real-world problems. This requires both great science & engineering! As an example of the power of our training stack, we were able to reach the top of the popular Bird single-model single-call leaderboard in our first attempt!

Kianté Brantley (@xkianteb) 's Twitter Profile Photo

If you are on the market for a postdoc, this opportunity is amazing! We have lots of compute resources and a vibrant community to help foster innovative ideas! Feel free to reach out to me if you have any questions.

Richard Pang (@yzpang_) 's Twitter Profile Photo

🚨Prompt Curriculum Learning (PCL) - Efficient LLM RL training algo! - We investigate factors that affect convergence: bsz, # prompt, # gen, prompt selection - We propose PCL: lightweight algo that *dynamically selects intermediate-difficulty prompts* using a learned value model

🚨Prompt Curriculum Learning (PCL) 
- Efficient LLM RL training algo!
- We investigate factors that affect convergence: bsz, # prompt, # gen, prompt selection
- We propose PCL: lightweight algo that *dynamically selects intermediate-difficulty prompts* using a learned value model