Tri Dao (@tri_dao) 's Twitter Profile
Tri Dao

@tri_dao

Asst. Prof @PrincetonCS, Chief Scientist @togethercompute. Machine learning & systems.

ID: 568879807

linkhttps://tridao.me calendar_today02-05-2012 07:13:50

795 Tweet

28,28K Followers

602 Following

Infini-AI-Lab (@infiniailab) 's Twitter Profile Photo

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: multiverse4fm.github.io 🧵 1/n

Inception Labs (@inceptionailabs) 's Twitter Profile Photo

We’re excited to launch Mercury, the first commercial-scale diffusion LLM tailored for chat applications! Ultra-fast and efficient, Mercury brings real-time responsiveness to conversations, just like Mercury Coder did for code.

Tri Dao (@tri_dao) 's Twitter Profile Photo

Crazy that we now have an open source model with 13B params that’s competitive w o1. And Mamba layers help bring much higher inference throughput

Together AI (@togethercompute) 's Twitter Profile Photo

Announcing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. Built in

Announcing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models.

Built in
AI21 Labs (@ai21labs) 's Twitter Profile Photo

Now live. A new update to our Jamba open model family 🎉 Same hybrid SSM-Transformer architecture, 256K context window, efficiency gains & open weights. Now with improved grounding & instruction following. Try it on AI21 Studio or download from Hugging Face 🤗 More on what

Now live. A new update to our Jamba open model family 🎉

Same hybrid SSM-Transformer architecture, 256K context window, efficiency gains & open weights.

Now with improved grounding & instruction following.

Try it on AI21 Studio or download from <a href="/huggingface/">Hugging Face</a>  🤗

More on what
Tri Dao (@tri_dao) 's Twitter Profile Photo

Turns out you can do length generalization for recurrent model by simply training for another extra 100 steps with a careful choice of initial states

Liliang Ren (@liliang_ren) 's Twitter Profile Photo

Reasoning can be made much, much faster—with fundamental changes in neural architecture. 😮 Introducing Phi4-mini-Flash-Reasoning: a 3.8B model that surpasses Phi4-mini-Reasoning on major reasoning tasks (AIME24/25, MATH500, GPQA-D), while delivering up-to 10× higher throughput

Reasoning can be made much, much faster—with fundamental changes in neural architecture. 😮
Introducing Phi4-mini-Flash-Reasoning: a 3.8B model that surpasses Phi4-mini-Reasoning on major reasoning tasks (AIME24/25, MATH500, GPQA-D), while delivering up-to 10× higher throughput
Mayank Mishra (@mayankmish98) 's Twitter Profile Photo

🦆QuACK: blazing fast cute-DSL GPU kernels with 3TB/s goodness! Optimizing your kernels as much as possible is important... unless you are okay with leaving throughput on the table. check out this work from vlaw, Ted Zadouri and Tri Dao

Ted Zadouri (@tedzadouri) 's Twitter Profile Photo

CuTe DSL feels almost unreal: minimal Python code hits peak memory throughput on H100, as we show in QuACK. Can't wait for the addition of kernels optimized for Blackwell in QuACK 🦆

Princeton Computer Science (@princetoncs) 's Twitter Profile Photo

Congrats to Parastoo Abtahi, Tri Dao and Alex Lombardi on being named 2025 Google Research Scholars. 🎉 The @googleresearch scholars program funds world-class research conducted by early-career professors. bit.ly/4kvpvFx

Congrats to <a href="/parastooabtahi/">Parastoo Abtahi</a>, <a href="/tri_dao/">Tri Dao</a> and Alex Lombardi on being named 2025 Google Research Scholars. 🎉

The @googleresearch scholars program funds world-class research conducted by early-career professors. 

bit.ly/4kvpvFx
Tri Dao (@tri_dao) 's Twitter Profile Photo

I played w it for 1h. Went through my usual prompts (math derivations, floating point optimizations, …). It’s a good model, feels comparable to the best frontier models

Together AI (@togethercompute) 's Twitter Profile Photo

🚨MAJOR DROP: Kimi K2 just landed on Together AI 🚀 An open-source 1T parameter model that beats proprietary LLMs in creativity, coding, and tool use while delivering 60-70% cost savings. Built for agents. Priced for scale. 👇

🚨MAJOR DROP: Kimi K2 just landed on Together AI 🚀 

An open-source 1T parameter model that beats proprietary LLMs in creativity, coding, and tool use while delivering 60-70% cost savings. 

Built for agents. Priced for scale. 👇
Yong Lin (@yong18850571) 's Twitter Profile Photo

(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B

(1/4)🚨 Introducing Goedel-Prover V2 🚨
🔥🔥🔥 The strongest open-source theorem prover to date.
🥇 #1 on PutnamBench: Solves 64 problems—with far less compute.
🧠 New SOTA on MiniF2F:
* 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%.
* 8B &gt; 671B: Our 8B