Tianqi Chen (@tqchenml) 's Twitter Profile
Tianqi Chen

@tqchenml

AssistProf @CarnegieMellon. Chief Technologist @OctoML. Creator of @XGBoostProject, @ApacheTVM. Member catalyst.cs.cmu.edu, @TheASF. Views are on my own

ID: 3187990776

linkhttps://tqchen.com/ calendar_today07-05-2015 19:10:39

1,1K Tweet

17,17K Followers

1,1K Following

Matei Zaharia (@matei_zaharia) 's Twitter Profile Photo

Excited to launch Agent Bricks, a new way to build auto-optimized agents on your tasks. Agent Bricks uniquely takes a *declarative* approach to agent development: you tell us what you want, and we auto-generate evals and optimize the agent. databricks.com/blog/introduci…

Yixin Dong (@yi_xin_dong) 's Twitter Profile Photo

Databricks 's Agent Bricks is powered by XGrammar for structured generation, and achieves high quality and efficiency. It helps you complete AI tasks without needing to worry about the algorithmic details. Give it a try!

NVIDIA AI Developer (@nvidiaaidev) 's Twitter Profile Photo

🔍 Our Deep Dive Blog Covering our Winning MLSys Paper on FlashInfer Is now live ➡️ nvda.ws/3ZA1Hca Accelerate LLM inference with FlashInfer—NVIDIA’s high-performance, JIT-compiled library built for ultra-efficient transformer inference on GPUs. Go under the hood with

🔍 Our Deep Dive Blog Covering our Winning MLSys Paper on FlashInfer Is now live ➡️ nvda.ws/3ZA1Hca

Accelerate LLM inference with FlashInfer—NVIDIA’s high-performance, JIT-compiled library built for ultra-efficient transformer inference on GPUs.

Go under the hood with
LMSYS Org (@lmsysorg) 's Twitter Profile Photo

The SGLang team just ran DeepSeek 671B on NVIDIA’s GB200 NVL72, unlocking 7,583 toks/sec/GPU for decoding w/ PD disaggregation + large-scale expert parallelism — 2.7× faster than H100. Don’t miss this work! 🔥 Thanks to Pen Li from NVIDIA who kicked off this collaboration and

The SGLang team just ran DeepSeek 671B on NVIDIA’s GB200 NVL72, unlocking 7,583 toks/sec/GPU for decoding w/ PD disaggregation + large-scale expert parallelism — 2.7× faster than H100. Don’t miss this work! 🔥

Thanks to Pen Li from NVIDIA who kicked off this collaboration and
zhyncs (@zhyncs42) 's Twitter Profile Photo

SGLang is an early user of FlashInfer and witnessed its rise as the de facto LLM inference kernel library. It won best paper at MLSys 2025, and Zihao now leads its development NVIDIA AI Developer. SGLang’s GB200 NVL72 optimizations were made possible with strong support from the

NVIDIA AI Developer (@nvidiaaidev) 's Twitter Profile Photo

.LMSYS Org (SGLang) now achieves 7,583 tokens per second per GPU running DeepSeek R1 on the GB200 NVL72, a 2.7x leap over H100. We're excited to see the open source ecosystem advance inference optimizations on GB200 NVL72, driving down cost per token for the industry at

.<a href="/lmsysorg/">LMSYS Org</a> (SGLang) now achieves 7,583 tokens per second per GPU running <a href="/deepseek_ai/">DeepSeek</a> R1 on the GB200 NVL72, a 2.7x leap over H100.

We're excited to see the open source ecosystem advance inference optimizations on GB200 NVL72, driving down cost per token for the industry at
Infini-AI-Lab (@infiniailab) 's Twitter Profile Photo

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: multiverse4fm.github.io 🧵 1/n

Tianqi Chen (@tqchenml) 's Twitter Profile Photo

Check out our work on parallel reasoning 🧠; We bring an AI-assisted curator that identifies parallel paths in sequential traces, then tune models into native parallel thinkers that runs efficiently with prefix sharing and batching. Really excited about this general direction

Beidi Chen (@beidichen) 's Twitter Profile Photo

Say hello to Multiverse — the Everything Everywhere All At Once of generative modeling. 💥 Lossless, adaptive, and gloriously parallel 🌀 Now open-sourced: multiverse4fm.github.io I was amazed how easily we could extract the intrinsic parallelism of even SOTA autoregressive

Xinyu Yang (@xinyu2ml) 's Twitter Profile Photo

🚀 Super excited to share Multiverse! 🏃 It’s been a long journey exploring the space between model design and hardware efficiency. What excites me most is realizing that, beyond optimizing existing models, we can discover better model architectures by embracing system-level

Zhihao Jia (@jiazhihao) 's Twitter Profile Photo

One of the best ways to reduce LLM latency is by fusing all computation and communication into a single GPU megakernel. But writing megakernels by hand is extremely hard. 🚀Introducing Mirage Persistent Kernel (MPK), a compiler that automatically transforms LLMs into optimized

One of the best ways to reduce LLM latency is by fusing all computation and communication into a single GPU megakernel. But writing megakernels by hand is extremely hard.

🚀Introducing Mirage Persistent Kernel (MPK), a compiler that automatically transforms LLMs into optimized
Chris Donahue (@chrisdonahuey) 's Twitter Profile Photo

Excited to announce 🎵Magenta RealTime, the first open weights music generation model capable of real-time audio generation with real-time control. 👋 **Try Magenta RT on Colab TPUs**: colab.research.google.com/github/magenta… 👀 Blog post: g.co/magenta/rt 🧵 below

Mehdi Amini (@jokereph) 's Twitter Profile Photo

I’ve been starting to collaborate with the folks who are building FlashInfer: nice project and pretty amazing set of people! Zihao Ye Tianqi Chen and everyone.

Zhihao Jia (@jiazhihao) 's Twitter Profile Photo

📢Exciting updates from #MLSys2025! All session recordings are now available and free to watch at mlsys.org. We’re also thrilled to announce that #MLSys2026 will be held in Seattle next May—submissions open next month with a deadline of Oct 30. We look forward to

📢Exciting updates from #MLSys2025! All session recordings are now available and free to watch at mlsys.org.
We’re also thrilled to announce that #MLSys2026 will be held in Seattle next May—submissions open next month with a deadline of Oct 30. We look forward to
Tianqi Chen (@tqchenml) 's Twitter Profile Photo

#MLSys2026 will be led by the general chair Luis Ceze and PC chairs Zhihao Jia and Aakanksha Chowdhery. The conference will be held in Bellevue on Seattle's east side. Consider submitting and bringing your latest works in AI and systems—more details at mlsys.org.

Banghua Zhu (@banghuaz) 's Twitter Profile Photo

Excited to share that I’m joining NVIDIA as a Principal Research Scientist! We’ll be joining forces on efforts in model post-training, evaluation, agents, and building better AI infrastructure—with a strong emphasis on collaboration with developers and academia. We’re committed

Excited to share that I’m joining NVIDIA as a Principal Research Scientist!

We’ll be joining forces on efforts in model post-training, evaluation, agents, and building better AI infrastructure—with a strong emphasis on collaboration with developers and academia. We’re committed