Qiying Yu (@qiying_yu) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Qiying Yu and team just dropped the DAPO algorithm (decoupled clip and dynamic sampling policy optimization)! DAPO-Zero-32B, a fully open-source RL reasoning model, surpasses DeepSeek-R1-Zero-Qwen-32B, and scores 50 on AIME 2024 with 50% fewer steps. It is trained with

<a href="/qiying_yu/">Qiying Yu</a> and team just dropped the DAPO algorithm (decoupled clip and dynamic sampling policy optimization)! DAPO-Zero-32B, a fully open-source RL reasoning model, surpasses DeepSeek-R1-Zero-Qwen-32B, and scores 50 on AIME 2024 with 50% fewer steps. It is trained with

thumb_up_off_alt462

chat_bubble_outline8

repeat114

shareShare

Marktechpost AI Research News ⚡

@marktechpost

4 months ago

ByteDance Research Releases DAPO: A Fully Open-Sourced LLM Reinforcement Learning System at Scale Researchers from ByteDance, Tsinghua University, and the University of Hong Kong recently introduced DAPO (Dynamic Sampling Policy Optimization), an open-source large-scale

thumb_up_off_alt20

chat_bubble_outline1

repeat11

shareShare

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

4 months ago

DAPO: An Open-Source LLM Reinforcement Learning System at Scale DAPO is a reinforcement learning algorithm for large-scale LLM training, achieving 50 points on AIME 2024 with Qwen2.5-32B. It introduces four key techniques to improve LLM reasoning and provides open-source

thumb_up_off_alt267

chat_bubble_outline2

repeat55

shareShare

Kyle Corbitt

@corbtt

4 months ago

Lots of good nuggets here. Interestingly, they completely drop the KL divergence penalty and get good results. This mirrors what we're finding in our own experiments. Seems not to be so necessary for RLVR with GRPO. As a bonus, skipping it speeds up training significantly!

thumb_up_off_alt115

chat_bubble_outline6

repeat16

shareShare

Philipp Schmid

@_philschmid

4 months ago

New RL Method thats better than GRPO! 🤯ByteDance Open Source released a new open source RL method that outperforms GRPO. DAPO or Decoupled Clip and Dynamic sAmpling Policy Optimization (DAPO) achieves 50 points on the AIME 2024 benchmark with 50% fewer training steps. TL;DR: 🏆 50%

New RL Method thats better than GRPO! 🤯<a href="/ByteDanceOSS/">ByteDance Open Source</a> released a new open source RL method that outperforms GRPO. DAPO or Decoupled Clip and Dynamic sAmpling Policy Optimization (DAPO) achieves 50 points on the AIME 2024 benchmark with 50% fewer training steps.

TL;DR:
🏆 50%

thumb_up_off_alt371

chat_bubble_outline11

repeat75

shareShare

Maziyar PANAHI

@maziyarpanahi

4 months ago

To my RL heroes, DAPO!😂 Who’s got insights? Let’s make some magic happen. 🔗 dapo-sia.github.io will brown Alexander Doria Unsloth AI Axolotl Hugging Face (trl/open-r1)

thumb_up_off_alt14

chat_bubble_outline2

repeat1

shareShare

PapersAnon

@papers_anon

4 months ago

DAPO: An Open-Source LLM Reinforcement Learning System at Scale From a joint ByteDance/Tsinghua team. Proposes the Decoupled Clip and Dynamic sAmpling Policy Optimization algorithm and fully open-sources a SOTA large-scale RL system. Both were used to achieve 50 points on AIME

thumb_up_off_alt27

chat_bubble_outline4

repeat6

shareShare

elvis

@omarsar0

4 months ago

DAPO: An Open-Source LLM Reinforcement Learning System at Scale It introduces DAPO, a fully open-source, large-scale RL system that boosts the chain-of-thought reasoning capabilities of LLMs. DAPO raises the upper clipping threshold (“Clip-Higher”) in PPO-style training,

thumb_up_off_alt254

chat_bubble_outline3

repeat64

shareShare

Qiying Yu

@qiying_yu

4 months ago

Thank you AK for featuring our work. Excited to share valuable insights and open-source usefull systems to the community ! 🌟

thumb_up_off_alt58

chat_bubble_outline1

repeat4

shareShare

TuringPost

@theturingpost

4 months ago

A new RL algorithm! DAPO (Decoupled Clip and Dynamic Sampling Policy Optimization) from ByteDance Open Source is a fully open-source RL system, that improves training in long Chain-of-Thought (CoT) reasoning. It achieves 50 points on AIME 2024, surpassing DeepSeek-R1-Zero, using only

A new RL algorithm!

DAPO (Decoupled Clip and Dynamic Sampling Policy Optimization) from <a href="/ByteDanceOSS/">ByteDance Open Source</a> is a fully open-source RL system, that improves training in long Chain-of-Thought (CoT) reasoning.

It achieves 50 points on AIME 2024, surpassing DeepSeek-R1-Zero, using only

thumb_up_off_alt96

chat_bubble_outline3

repeat24

shareShare

AK

@_akhaliq

4 months ago

China's ByteDance presents VAPO Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks present VAPO, Value-based Augmented Proximal Policy Optimization framework for reasoning models. a novel framework tailored for reasoning models within the value-based

thumb_up_off_alt283

chat_bubble_outline6

repeat39

shareShare

Haibin

@eric_haibin_lin

4 months ago

🚀 Introducing VAPO (Value-based augmented PPO), our latest RL method for reasoning models. Trained from Qwen-32B-base model, VAPO achieves 60.4 on AIME 2024, outperforming DeepSeek-zero-32B and DAPO-32B📈. Built with verl project, and yes, we will open source it soon. Key

thumb_up_off_alt178

chat_bubble_outline5

repeat29

shareShare

You Jiacheng

@youjiacheng

4 months ago

DAPO's clip high (and other insights) really works well. It has been independently reproduced! Congrats Qiying Yu !

thumb_up_off_alt17

chat_bubble_outline1

repeat2

shareShare

Quentin Gallouédec

@qgallouedec

4 months ago

Overlong filtering has been shown to significantly stabilize learning and improve performance. You can now use it in TRL! It simply consists in masking the loss of truncated samples. Principle proposed by Qiying Yu in DAPO, implemented by Shirin Yamani 👏

thumb_up_off_alt129

chat_bubble_outline2

repeat12

shareShare

Qiying Yu

@qiying_yu

3 months ago

#ICLR2025 I am going to present VAPO & DAPO twice at ICLR, two SOTA LLM RL algorithms. 1. The 1-2 pm verl Expo Talk, Apr 26, Peridot 202-203 2. The 3:00-3:30 pm break, Apr 24, at the ByteDance Booth Welcome and see you there!

thumb_up_off_alt47

chat_bubble_outline1

repeat3

shareShare