Qinqing Zheng (@qqyuzu) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Introducing Dualformer: a new model that integrates fast and slow thinking! By learning with randomized reasoning traces, Dualformer offers both quick response and enhanced performance with more succinct CoTs. arxiv.org/pdf/2410.09918 w/ Andy Mike Sainbayar Sukhbaatar Yuandong Tian

thumb_up_off_alt174

chat_bubble_outline5

repeat28

shareShare

Yuandong Tian

@tydsh

9 months ago

🚀🎯Dualformer, our simple yet novel training paradigm that leads to 1️⃣ Emergent behaviors of automatic switching between system 1 (fast) and 2 (slow) thinking. 2️⃣ Works better than system 1/2 models alone, on Maze navigation, Sokoban and even math reasoning tasks. 3️⃣

thumb_up_off_alt198

chat_bubble_outline3

repeat41

shareShare

Kevin Patrick Murphy

@sirbayes

9 months ago

Excited to share our new paper on "Diffusion Model Predictive Control" (D-MPC). Key idea: leverage diffusion models to learn a trajectory-level (not just single-step) world model to mitigate compounding errors when doing rollouts. arxiv.org/abs/2410.05364

thumb_up_off_alt559

chat_bubble_outline6

repeat73

shareShare

Qinqing Zheng

@qqyuzu

7 months ago

ONI offers concurrent policy training & reward synthesizing, a good fit for long horizon sparse reward problems! I also believe its great potential to be extended to multimodal inputs and complex planning/reasoning environments!

thumb_up_off_alt15

chat_bubble_outline0

repeat2

shareShare

Mikael Henaff

@henaffmikael

7 months ago

We share our code - excited to see what people build with this! Many thanks to Qinqing Zheng Aditya Grover Amy Zhang Brandon Amos for another fun collaboration.

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

AK

@_akhaliq

6 months ago

Token Assorted Mixing Latent and Text Tokens for Improved Language Model Reasoning

thumb_up_off_alt101

chat_bubble_outline4

repeat17

shareShare

Yifei Wang

@yifeiwang77

5 months ago

Great to see a reviving interest in long-context LLMs these days (kudos to awesome evals and archs)! But are you training long-context LLMs wisely (to save the huge cost)? In recent #ICLR2025 paper, we show that vanilla next token prediction could be very suboptimal(!!) for

thumb_up_off_alt318

chat_bubble_outline8

repeat47

shareShare

AI at Meta

@aiatmeta

4 months ago

Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model

thumb_up_off_alt13,13K

chat_bubble_outline706

repeat2,2K

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

4 months ago

BREAKING: Meta's Llama 4 Maverick just hit #2 overall - becoming the 4th org to break 1400+ on Arena!🔥 Highlights: - #1 open model, surpassing DeepSeek - Tied #1 in Hard Prompts, Coding, Math, Creative Writing - Huge leap over Llama 3 405B: 1268 → 1417 - #5 under style control

thumb_up_off_alt2,2K

chat_bubble_outline81

repeat425

shareShare

Artificial Analysis

@artificialanlys

3 months ago

How many tokens do reasoning models use vs. non-reasoning? We've measured up to a 20X difference Key takeaways: ➤ Reasoning models (models that ‘think’ before answering) use up to 20x more tokens than non-reasoning models. Claude 3.7 Sonnet Thinking (64k token budget) used ~15X

thumb_up_off_alt268

chat_bubble_outline14

repeat26

shareShare

Noam Brown

@polynoamial

3 months ago

Our new OpenAI o3 and o4-mini models further confirm that scaling inference improves intelligence, and that scaling RL shifts up the whole compute vs. intelligence curve. There is still a lot of room to scale both of these further.

Our new <a href="/OpenAI/">OpenAI</a> o3 and o4-mini models further confirm that scaling inference improves intelligence, and that scaling RL shifts up the whole compute vs. intelligence curve. There is still a lot of room to scale both of these further.

thumb_up_off_alt1,1K

chat_bubble_outline52

repeat165

shareShare

Aditya Grover

@adityagrover_

3 months ago

Thank you VentureBeat for covering our research on enhancing reasoning with diffusion LLMs using d1. Great collaboration with Siyan Zhao , Devaansh Gupta and Qinqing Zheng .

thumb_up_off_alt19

chat_bubble_outline1

repeat3

shareShare

Qinqing Zheng

@qqyuzu

3 months ago

Token Assorted is accepted to #ICML2025! See you friends in Vancouver!

thumb_up_off_alt26

chat_bubble_outline1

repeat1

shareShare

Zhuang Liu

@liuzhuang1234

3 months ago

Accepted to #ICML 25 & also recently featured in CMU news and Fast Company: cs.cmu.edu/news/2025/llm-… fastcompany.com/91286162/ai-ch…

thumb_up_off_alt133

chat_bubble_outline1

repeat9

shareShare

Saining Xie

@sainingxie

3 months ago

Wow, Deeply Supervised Nets received the Test of Time award at AISTATS Conference 2025! It was the very first paper I submitted during my PhD. Fun fact: the paper was originally rejected by NeurIPS with scores of 8/8/7 (yes, that pain stuck with me... maybe now I can finally let it

thumb_up_off_alt499

chat_bubble_outline33

repeat42

shareShare

Qinqing Zheng

@qqyuzu

3 months ago

A nice and clean implementation based on huggingface TRL!

thumb_up_off_alt12

chat_bubble_outline0

repeat3

shareShare

Kevin Frans

@kvfrans

2 months ago

Stare at policy improvement and diffusion guidance, and you may notice a suspicious similarity... We lay out an equivalence between the two, formalizing a simple technique (CFGRL) to improve performance across-the-board when training diffusion policies. arxiv.org/abs/2505.23458

thumb_up_off_alt239

chat_bubble_outline8

repeat37

shareShare