Yu Meng @ ICLR'25 (@yumeng0818) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Sigh, it's a bit of a mess. Let me just give you guys the full nuance in one stream of consciousness since I think we'll continue to get partial interpretations that confuse everyone. All the little things I post need to always be put together in one place. First, I have long

thumb_up_off_alt573

chat_bubble_outline18

repeat79

shareShare

Xinyu Zhu

@tianhongzxy

a month ago

🔥The debate’s been wild: How does the reward in RLVR actually improve LLM reasoning?🤔 🚀Introducing our new paper👇 💡TL;DR: Just penalizing incorrect rollouts❌ — no positive reward needed — can boost LLM reasoning, and sometimes better than PPO/GRPO! 🧵[1/n]

thumb_up_off_alt401

chat_bubble_outline6

repeat58

shareShare

Yu Meng @ ICLR'25

@yumeng0818

a month ago

What truly drives reasoning in RLVR? Check out our new paper led by Xinyu Zhu for some fascinating insights and analysis!! 🤩

thumb_up_off_alt27

chat_bubble_outline0

repeat3

shareShare

Andrew Zhao

@andrewz45732491

a month ago

hmmm if you never push up, you maintain more entropy by not doing excessive sharpening. These guys might be onto something🧐

thumb_up_off_alt14

chat_bubble_outline1

repeat4

shareShare

Mengzhou Xia

@xiamengzhou

a month ago

Surprisingly, we find training only with incorrect traces leads to strong performance 🤯 Even more interesting: it improves model diversity and test-time scaling—while correct traces do the opposite. Check out the 🧵👇

thumb_up_off_alt156

chat_bubble_outline2

repeat18

shareShare

1a3orn

@1a3orn

a month ago

Oh man this is a gorgeous idea. Training *against* negative samples but not towards positive ones maintains entropy in the model, therefore increases pass@high k during RL.

thumb_up_off_alt307

chat_bubble_outline8

repeat18

shareShare

Yu Meng @ ICLR'25

@yumeng0818

a month ago

Want powerful reasoning in LLMs without the massive RL training costs? 🤯 Our new paper (led by Siru Ouyang) explores transferring reasoning abilities directly from smaller LMs!🚀🚀

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Xinyu Zhu

@tianhongzxy

a month ago

Mengzhou Xia Zhepei Wei Wei-Lin Chen Danqi Chen Yu Meng The arxiv version is out! arxiv.org/abs/2506.01347

thumb_up_off_alt18

chat_bubble_outline0

repeat3

shareShare

机器之心 JIQIZHIXIN

@synced_global

a month ago

The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning Paper: arxiv.org/pdf/2506.01347… Code: github.com/TianHongZXY/RL…

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

Stanford NLP Group

@stanfordnlp

a month ago

Only ding a model for making mistakes! It gives better results in RL and avoids mode collapse. We still understand so little about RL! But we’re learning. Your science dollars at work.

thumb_up_off_alt48

chat_bubble_outline2

repeat13

shareShare

Yu Meng @ ICLR'25

@yumeng0818

a month ago

Excited to share our #ICML25 paper (led by Zhepei Wei) on accelerating LLM decoding! ⚡️ AdaDecode predicts tokens early from intermediate layers 🙅‍♂️No drafter model needed 🪶Just lightweight LM heads ✨Output consistency with standard autoregressive decoding Thread👇

thumb_up_off_alt35

chat_bubble_outline1

repeat5

shareShare

Jiaxin Huang

@jiaxinhuang0229

a month ago

🚀🚀Excited to share our new work on Speculative Decoding by Langlin Huang! We tackle a key limitation in draft models which predict worse tokens at later positions, and present PosS that generates high-quality drafts!

thumb_up_off_alt10

chat_bubble_outline1

repeat3

shareShare

Weijia Shi

@weijiashi2

a month ago

Excited to be at #CVPR2025 this week! I’ll be talking about tool-augmented multimodal reasoning in Thursday’s tutorial. Come say hi if you’re around🍻 ⏰ 1:30–5:00 PM CDT, June 12 📍 Room 107 B, CVPR venue

Excited to be at <a href="/CVPR/">#CVPR2025</a> this week! I’ll be talking about tool-augmented multimodal reasoning in Thursday’s tutorial. Come say hi if you’re around🍻

⏰ 1:30–5:00 PM CDT, June 12
📍 Room 107 B, CVPR venue

thumb_up_off_alt89

chat_bubble_outline1

repeat7

shareShare

Yuchen Zhuang

@yuchen_zhuang

a month ago

Excited to share that I joined Google DeepMind as a research scientist recently 🥳 Looking forward to future collaborations on exciting projects :)

thumb_up_off_alt1,1K

chat_bubble_outline16

repeat11

shareShare

Gautam Kamath

@thegautamkamath

a month ago

I wrote a short post on some etiquette for a seemingly mundane task: declining an offer (for a job, internship, grad school, etc). Link in next tweet. 1/2

thumb_up_off_alt141

chat_bubble_outline4

repeat9

shareShare

Siru Ouyang

@siru_ouyang

23 days ago

🚀 Finally live on arXiv after 3 weeks on hold 🤣 Check it out 👉 arxiv.org/abs/2506.15710 #Reasoning #reinforcementlearning #LLMs

thumb_up_off_alt60

chat_bubble_outline2

repeat11

shareShare

CLS

@chengleisi

16 days ago

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

thumb_up_off_alt553

chat_bubble_outline10

repeat162

shareShare

Yu Meng @ ICLR'25

@yumeng0818

5 days ago

Will be at #ICML2025 next week! We'll present the following works: 🛠️ LarPO: Tue 7/15 (Poster Session 1 East) 🚀 AdaDecode: Wed 7/16 (Poster Session 3 East) 🧮 Negative Reinforcement for Reasoning: Fri 7/18 (AI for Math Workshop) Happy to chat about latest research in LLMs🤩

thumb_up_off_alt24

chat_bubble_outline0

repeat7

shareShare

Zhepei Wei ✈️ ICLR 2025

@weizhepei

2 days ago

Thrilled to present three works at #ICML2025!🥳 🚀AdaDecode — Wed 7/16, East Exhibition Hall A-B (#E-2605) 🔢Negative Reinforcement for Reasoning — Fri 7/18, AI for Math Workshop 🤖WebAgent-R1 — Sat 7/19, Workshop on Computer Use Agents Feel free to stop by and chat about #LLMs!

thumb_up_off_alt12

chat_bubble_outline0

repeat5

shareShare

Yu Meng @ ICLR'25

Gate.io

Omar Khattab

Xinyu Zhu

Yu Meng @ ICLR'25

Andrew Zhao

Mengzhou Xia

1a3orn

Yu Meng @ ICLR'25

Xinyu Zhu

机器之心 JIQIZHIXIN

Stanford NLP Group

Yu Meng @ ICLR'25

Jiaxin Huang

Weijia Shi

Yuchen Zhuang

Gautam Kamath

Siru Ouyang

CLS

Yu Meng @ ICLR'25

Zhepei Wei ✈️ ICLR 2025