Qinqing Zheng (@qqyuzu) 's Twitter Profile
Qinqing Zheng

@qqyuzu

Reinforcement Learning, Generative Modeling @ FAIR (@AIatMeta). PhD @UChicago.

ID: 1492206962506964993

linkhttp://enosair.github.io calendar_today11-02-2022 18:41:33

52 Tweet

546 Followers

161 Following

Qinqing Zheng (@qqyuzu) 's Twitter Profile Photo

Introducing Dualformer: a new model that integrates fast and slow thinking! By learning with randomized reasoning traces, Dualformer offers both quick response and enhanced performance with more succinct CoTs. arxiv.org/pdf/2410.09918 w/ Andy Mike Sainbayar Sukhbaatar Yuandong Tian

Introducing Dualformer: a new model that integrates fast and slow thinking! By learning with randomized reasoning traces, Dualformer offers both quick response and enhanced performance with more succinct CoTs.  arxiv.org/pdf/2410.09918 w/ Andy Mike <a href="/tesatory/">Sainbayar Sukhbaatar</a> <a href="/tydsh/">Yuandong Tian</a>
Yuandong Tian (@tydsh) 's Twitter Profile Photo

🚀🎯Dualformer, our simple yet novel training paradigm that leads to 1️⃣ Emergent behaviors of automatic switching between system 1 (fast) and 2 (slow) thinking. 2️⃣ Works better than system 1/2 models alone, on Maze navigation, Sokoban and even math reasoning tasks. 3️⃣

Kevin Patrick Murphy (@sirbayes) 's Twitter Profile Photo

Excited to share our new paper on "Diffusion Model Predictive Control" (D-MPC). Key idea: leverage diffusion models to learn a trajectory-level (not just single-step) world model to mitigate compounding errors when doing rollouts. arxiv.org/abs/2410.05364

Qinqing Zheng (@qqyuzu) 's Twitter Profile Photo

ONI offers concurrent policy training & reward synthesizing, a good fit for long horizon sparse reward problems! I also believe its great potential to be extended to multimodal inputs and complex planning/reasoning environments!

Yifei Wang (@yifeiwang77) 's Twitter Profile Photo

Great to see a reviving interest in long-context LLMs these days (kudos to awesome evals and archs)! But are you training long-context LLMs wisely (to save the huge cost)? In recent #ICLR2025 paper, we show that vanilla next token prediction could be very suboptimal(!!) for

Great to see a reviving interest in long-context LLMs these days (kudos to awesome evals and archs)! But are you training long-context LLMs wisely (to save the huge cost)?

In recent #ICLR2025 paper, we show that vanilla next token prediction could be very suboptimal(!!)  for
AI at Meta (@aiatmeta) 's Twitter Profile Photo

Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model

Today is the start of a new era of natively multimodal AI innovation.

Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick —  our most advanced models yet and the best in their class for multimodality.

Llama 4 Scout
• 17B-active-parameter model
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

BREAKING: Meta's Llama 4 Maverick just hit #2 overall - becoming the 4th org to break 1400+ on Arena!🔥 Highlights: - #1 open model, surpassing DeepSeek - Tied #1 in Hard Prompts, Coding, Math, Creative Writing - Huge leap over Llama 3 405B: 1268 → 1417 - #5 under style control

BREAKING: Meta's Llama 4 Maverick just hit #2 overall - becoming the 4th org to break 1400+ on Arena!🔥

Highlights:
- #1 open model, surpassing DeepSeek
- Tied #1 in Hard Prompts, Coding, Math, Creative Writing
- Huge leap over Llama 3 405B: 1268 → 1417
- #5 under style control
Artificial Analysis (@artificialanlys) 's Twitter Profile Photo

How many tokens do reasoning models use vs. non-reasoning? We've measured up to a 20X difference Key takeaways: ➤ Reasoning models (models that ‘think’ before answering) use up to 20x more tokens than non-reasoning models. Claude 3.7 Sonnet Thinking (64k token budget) used ~15X

How many tokens do reasoning models use vs. non-reasoning? We've measured up to a 20X difference

Key takeaways:
➤ Reasoning models (models that ‘think’ before answering) use up to 20x more tokens than non-reasoning models. Claude 3.7 Sonnet Thinking (64k token budget) used ~15X
Noam Brown (@polynoamial) 's Twitter Profile Photo

Our new OpenAI o3 and o4-mini models further confirm that scaling inference improves intelligence, and that scaling RL shifts up the whole compute vs. intelligence curve. There is still a lot of room to scale both of these further.

Our new <a href="/OpenAI/">OpenAI</a> o3 and o4-mini models further confirm that scaling inference improves intelligence, and that scaling RL shifts up the whole compute vs. intelligence curve. There is still a lot of room to scale both of these further.
Zhuang Liu (@liuzhuang1234) 's Twitter Profile Photo

Accepted to #ICML 25 & also recently featured in CMU news and Fast Company: cs.cmu.edu/news/2025/llm-… fastcompany.com/91286162/ai-ch…

Saining Xie (@sainingxie) 's Twitter Profile Photo

Wow, Deeply Supervised Nets received the Test of Time award at AISTATS Conference 2025! It was the very first paper I submitted during my PhD. Fun fact: the paper was originally rejected by NeurIPS with scores of 8/8/7 (yes, that pain stuck with me... maybe now I can finally let it

Kevin Frans (@kvfrans) 's Twitter Profile Photo

Stare at policy improvement and diffusion guidance, and you may notice a suspicious similarity... We lay out an equivalence between the two, formalizing a simple technique (CFGRL) to improve performance across-the-board when training diffusion policies. arxiv.org/abs/2505.23458

Stare at policy improvement and diffusion guidance, and you may notice a suspicious similarity...

We lay out an equivalence between the two, formalizing a simple technique (CFGRL) to improve performance across-the-board when training diffusion policies.

arxiv.org/abs/2505.23458