Yu Meng @ ICLR'25 (@yumeng0818) 's Twitter Profile
Yu Meng @ ICLR'25

@yumeng0818

Asst. Professor @CS_UVA (LLM/ML/NLP)

Past: PhD from @IllinoisCS, visiting researcher @princeton_nlp, Google PhD Fellow.

ID: 1455363866452430855

linkhttps://yumeng5.github.io/ calendar_today02-11-2021 02:39:43

134 Tweet

1,1K Followers

278 Following

Omar Khattab (@lateinteraction) 's Twitter Profile Photo

Sigh, it's a bit of a mess. Let me just give you guys the full nuance in one stream of consciousness since I think we'll continue to get partial interpretations that confuse everyone. All the little things I post need to always be put together in one place. First, I have long

Xinyu Zhu (@tianhongzxy) 's Twitter Profile Photo

🔥The debate’s been wild: How does the reward in RLVR actually improve LLM reasoning?🤔 🚀Introducing our new paper👇 💡TL;DR: Just penalizing incorrect rollouts❌ — no positive reward needed — can boost LLM reasoning, and sometimes better than PPO/GRPO! 🧵[1/n]

🔥The debate’s been wild: How does the reward in RLVR actually improve LLM reasoning?🤔
🚀Introducing our new paper👇
💡TL;DR: Just penalizing incorrect rollouts❌ — no positive reward needed — can boost LLM reasoning, and sometimes better than PPO/GRPO!

🧵[1/n]
Andrew Zhao (@andrewz45732491) 's Twitter Profile Photo

hmmm if you never push up, you maintain more entropy by not doing excessive sharpening. These guys might be onto something🧐

Mengzhou Xia (@xiamengzhou) 's Twitter Profile Photo

Surprisingly, we find training only with incorrect traces leads to strong performance 🤯 Even more interesting: it improves model diversity and test-time scaling—while correct traces do the opposite. Check out the 🧵👇

1a3orn (@1a3orn) 's Twitter Profile Photo

Oh man this is a gorgeous idea. Training *against* negative samples but not towards positive ones maintains entropy in the model, therefore increases pass@high k during RL.

Yu Meng @ ICLR'25 (@yumeng0818) 's Twitter Profile Photo

Want powerful reasoning in LLMs without the massive RL training costs? 🤯 Our new paper (led by Siru Ouyang) explores transferring reasoning abilities directly from smaller LMs!🚀🚀

机器之心 JIQIZHIXIN (@synced_global) 's Twitter Profile Photo

The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning Paper: arxiv.org/pdf/2506.01347… Code: github.com/TianHongZXY/RL…

The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning
Paper: arxiv.org/pdf/2506.01347…
Code: github.com/TianHongZXY/RL…
Stanford NLP Group (@stanfordnlp) 's Twitter Profile Photo

Only ding a model for making mistakes! It gives better results in RL and avoids mode collapse. We still understand so little about RL! But we’re learning. Your science dollars at work.

Yu Meng @ ICLR'25 (@yumeng0818) 's Twitter Profile Photo

Excited to share our #ICML25 paper (led by Zhepei Wei) on accelerating LLM decoding! ⚡️ AdaDecode predicts tokens early from intermediate layers 🙅‍♂️No drafter model needed 🪶Just lightweight LM heads ✨Output consistency with standard autoregressive decoding Thread👇

Jiaxin Huang (@jiaxinhuang0229) 's Twitter Profile Photo

🚀🚀Excited to share our new work on Speculative Decoding by Langlin Huang! We tackle a key limitation in draft models which predict worse tokens at later positions, and present PosS that generates high-quality drafts!

Weijia Shi (@weijiashi2) 's Twitter Profile Photo

Excited to be at #CVPR2025 this week! I’ll be talking about tool-augmented multimodal reasoning in Thursday’s tutorial. Come say hi if you’re around🍻 ⏰ 1:30–5:00 PM CDT, June 12 📍 Room 107 B, CVPR venue

Excited to be at <a href="/CVPR/">#CVPR2025</a> this week! I’ll be talking about tool-augmented multimodal reasoning in Thursday’s tutorial. Come say hi if you’re around🍻

⏰ 1:30–5:00 PM CDT, June 12
📍 Room 107 B, CVPR venue
Gautam Kamath (@thegautamkamath) 's Twitter Profile Photo

I wrote a short post on some etiquette for a seemingly mundane task: declining an offer (for a job, internship, grad school, etc). Link in next tweet. 1/2

I wrote a short post on some etiquette for a seemingly mundane task: declining an offer (for a job, internship, grad school, etc). Link in next tweet. 1/2
Siru Ouyang (@siru_ouyang) 's Twitter Profile Photo

🚀 Finally live on arXiv after 3 weeks on hold 🤣 Check it out 👉 arxiv.org/abs/2506.15710 #Reasoning #reinforcementlearning #LLMs

CLS (@chengleisi) 's Twitter Profile Photo

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

Are AI scientists already better than human researchers?

We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts.

Main finding: LLM ideas result in worse projects than human ideas.
Yu Meng @ ICLR'25 (@yumeng0818) 's Twitter Profile Photo

Will be at #ICML2025 next week! We'll present the following works: 🛠️ LarPO: Tue 7/15 (Poster Session 1 East) 🚀 AdaDecode: Wed 7/16 (Poster Session 3 East) 🧮 Negative Reinforcement for Reasoning: Fri 7/18 (AI for Math Workshop) Happy to chat about latest research in LLMs🤩

Will be at #ICML2025 next week! We'll present the following works:
🛠️ LarPO: Tue 7/15 (Poster Session 1 East)
🚀 AdaDecode: Wed 7/16 (Poster Session 3 East)
🧮 Negative Reinforcement for Reasoning: Fri 7/18 (AI for Math Workshop)
Happy to chat about latest research in LLMs🤩
Zhepei Wei ✈️ ICLR 2025 (@weizhepei) 's Twitter Profile Photo

Thrilled to present three works at #ICML2025!🥳 🚀AdaDecode — Wed 7/16, East Exhibition Hall A-B (#E-2605) 🔢Negative Reinforcement for Reasoning — Fri 7/18, AI for Math Workshop 🤖WebAgent-R1 — Sat 7/19, Workshop on Computer Use Agents Feel free to stop by and chat about #LLMs!