Wen Sun
@wensun1
Assistant Professor at Cornell CS. Machine Learning and Reinforcement Learning; check out the RL Algorithm and theory book here rltheorybook.github.io
ID: 824609918
https://wensun.github.io 15-09-2012 05:27:25
62 Tweet
548 Followers
59 Following
3/3 for #ICLR2025! Huge congratulations to lead authors (Nicolas Espinosa-Dice, Zhaolin Gao, and Runzhe Wu). I'll save a more in depth discussion of the papers for later but if you'd like a sneak peak, check out arxiv.org/abs/2410.13855 / arxiv.org/abs/2410.04612)!
I think of misspecification (embodiment / sensory gaps) as the fundamental reason behavioral cloning isn't "all you need" for imitation as matching actions != matching outcomes. Introducing Nicolas Espinosa Dice's #ICLR2025 paper proving that "local search" *is* all you need! [1/n]
I won't be at #ICLR2025 myself this time around but please go talk to lead authors Nicolas Espinosa-Dice, Zhaolin Gao, and Runzhe Wu about their bleeding-edge algorithms for imitation learning and RLHF!
blog.ml.cmu.edu/2025/06/01/rlh… In this in-depth coding tutorial, Zhaolin Gao and Gokul Swamy walk through the steps to train an LLM via RL from Human Feedback!
New in the #DeeperLearningBlog: Researchers from the #KempnerInstitute, Cornell University and CMU Robotics Institute introduce a new method for improving offline RL by scaling-up test-time compute. kempnerinstitute.harvard.edu/research/deepe… #AI #RL (1/2)
How can small LLMs match or even surpass frontier models like DeepSeek R1 and o3 Mini in math competition (AIME & HMMT) reasoning? Prior work seems to suggest that ideas like PRMs do not really work or scale well for long context reasoning. Kaiwen Wang will reveal how a novel
New in the #DeeperLearningBlog: Zhaolin Gao and collaborators including the #KempnerInstitute's Kianté Brantley presents a powerful new #RL algorithm tailored for reasoning tasks with #LLMs that updates using only one generation per prompt. bit.ly/44US1Mt Kianté Brantley (Hiring Fall25 Postdoc and PhDs) #AI
At Databricks , we are using RLVR to solve real-world problems. This requires both great science & engineering! As an example of the power of our training stack, we were able to reach the top of the popular Bird single-model single-call leaderboard in our first attempt!