
Qiying Yu
@qiying_yu
PhD student at Tsinghua AIR @Tsinghua_Uni @AIRTHU1201
ID: 1627292036868214784
https://yqy2001.github.io 19-02-2023 13:00:54
80 Tweet
529 Followers
705 Following





New RL Method thats better than GRPO! 🤯ByteDance Open Source released a new open source RL method that outperforms GRPO. DAPO or Decoupled Clip and Dynamic sAmpling Policy Optimization (DAPO) achieves 50 points on the AIME 2024 benchmark with 50% fewer training steps. TL;DR: 🏆 50%


To my RL heroes, DAPO!😂 Who’s got insights? Let’s make some magic happen. 🔗 dapo-sia.github.io will brown Alexander Doria Unsloth AI Axolotl Hugging Face (trl/open-r1)




A new RL algorithm! DAPO (Decoupled Clip and Dynamic Sampling Policy Optimization) from ByteDance Open Source is a fully open-source RL system, that improves training in long Chain-of-Thought (CoT) reasoning. It achieves 50 points on AIME 2024, surpassing DeepSeek-R1-Zero, using only



🚀 Introducing VAPO (Value-based augmented PPO), our latest RL method for reasoning models. Trained from Qwen-32B-base model, VAPO achieves 60.4 on AIME 2024, outperforming DeepSeek-zero-32B and DAPO-32B📈. Built with verl project, and yes, we will open source it soon. Key



