Tianqi Chen (@tqchenml) Twitter Tweets • TwiCopy

Matei Zaharia

5 months ago

Excited to launch Agent Bricks, a new way to build auto-optimized agents on your tasks. Agent Bricks uniquely takes a *declarative* approach to agent development: you tell us what you want, and we auto-generate evals and optimize the agent. databricks.com/blog/introduci…

thumb_up_off_alt239

chat_bubble_outline5

repeat44

shareShare

Yixin Dong

@yi_xin_dong

5 months ago

Databricks 's Agent Bricks is powered by XGrammar for structured generation, and achieves high quality and efficiency. It helps you complete AI tasks without needing to worry about the algorithmic details. Give it a try!

thumb_up_off_alt12

chat_bubble_outline0

repeat4

shareShare

Russ Salakhutdinov

@rsalakhu

5 months ago

Holy cow! It has been over 10 years - no way! Feels like I was giving this tutorial just a few years ago.

thumb_up_off_alt79

chat_bubble_outline0

repeat5

shareShare

NVIDIA AI Developer

@nvidiaaidev

5 months ago

🔍 Our Deep Dive Blog Covering our Winning MLSys Paper on FlashInfer Is now live ➡️ nvda.ws/3ZA1Hca Accelerate LLM inference with FlashInfer—NVIDIA’s high-performance, JIT-compiled library built for ultra-efficient transformer inference on GPUs. Go under the hood with

thumb_up_off_alt82

chat_bubble_outline8

repeat25

shareShare

Tianqi Chen

@tqchenml

5 months ago

Checkout the technical deep dive on FlashInfer

thumb_up_off_alt28

chat_bubble_outline0

repeat4

shareShare

LMSYS Org

@lmsysorg

5 months ago

The SGLang team just ran DeepSeek 671B on NVIDIA’s GB200 NVL72, unlocking 7,583 toks/sec/GPU for decoding w/ PD disaggregation + large-scale expert parallelism — 2.7× faster than H100. Don’t miss this work! 🔥 Thanks to Pen Li from NVIDIA who kicked off this collaboration and

thumb_up_off_alt106

chat_bubble_outline4

repeat23

shareShare

zhyncs

@zhyncs42

5 months ago

SGLang is an early user of FlashInfer and witnessed its rise as the de facto LLM inference kernel library. It won best paper at MLSys 2025, and Zihao now leads its development NVIDIA AI Developer. SGLang’s GB200 NVL72 optimizations were made possible with strong support from the

thumb_up_off_alt91

chat_bubble_outline2

repeat13

shareShare

NVIDIA AI Developer

@nvidiaaidev

5 months ago

.LMSYS Org (SGLang) now achieves 7,583 tokens per second per GPU running DeepSeek R1 on the GB200 NVL72, a 2.7x leap over H100. We're excited to see the open source ecosystem advance inference optimizations on GB200 NVL72, driving down cost per token for the industry at

.<a href="/lmsysorg/">LMSYS Org</a> (SGLang) now achieves 7,583 tokens per second per GPU running <a href="/deepseek_ai/">DeepSeek</a> R1 on the GB200 NVL72, a 2.7x leap over H100.

We're excited to see the open source ecosystem advance inference optimizations on GB200 NVL72, driving down cost per token for the industry at

thumb_up_off_alt173

chat_bubble_outline9

repeat34

shareShare

LMSYS Org

@lmsysorg

5 months ago

NVIDIA🤗SGLang🚀

thumb_up_off_alt12

chat_bubble_outline0

repeat2

shareShare

Infini-AI-Lab

@infiniailab

5 months ago

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: multiverse4fm.github.io 🧵 1/n

thumb_up_off_alt207

chat_bubble_outline2

repeat76

shareShare

Tianqi Chen

@tqchenml

5 months ago

Check out our work on parallel reasoning 🧠; We bring an AI-assisted curator that identifies parallel paths in sequential traces, then tune models into native parallel thinkers that runs efficiently with prefix sharing and batching. Really excited about this general direction

thumb_up_off_alt98

chat_bubble_outline1

repeat15

shareShare

Beidi Chen

@beidichen

5 months ago

Say hello to Multiverse — the Everything Everywhere All At Once of generative modeling. 💥 Lossless, adaptive, and gloriously parallel 🌀 Now open-sourced: multiverse4fm.github.io I was amazed how easily we could extract the intrinsic parallelism of even SOTA autoregressive

thumb_up_off_alt66

chat_bubble_outline2

repeat19

shareShare

Xinyu Yang

@xinyu2ml

5 months ago

🚀 Super excited to share Multiverse! 🏃 It’s been a long journey exploring the space between model design and hardware efficiency. What excites me most is realizing that, beyond optimizing existing models, we can discover better model architectures by embracing system-level

thumb_up_off_alt57

chat_bubble_outline3

repeat18

shareShare

Zhihao Jia

@jiazhihao

5 months ago

One of the best ways to reduce LLM latency is by fusing all computation and communication into a single GPU megakernel. But writing megakernels by hand is extremely hard. 🚀Introducing Mirage Persistent Kernel (MPK), a compiler that automatically transforms LLMs into optimized

thumb_up_off_alt439

chat_bubble_outline6

repeat68

shareShare

Chris Donahue

@chrisdonahuey

5 months ago

Excited to announce 🎵Magenta RealTime, the first open weights music generation model capable of real-time audio generation with real-time control. 👋 **Try Magenta RT on Colab TPUs**: colab.research.google.com/github/magenta… 👀 Blog post: g.co/magenta/rt 🧵 below

thumb_up_off_alt131

chat_bubble_outline9

repeat28

shareShare

Mehdi Amini

@jokereph

5 months ago

I’ve been starting to collaborate with the folks who are building FlashInfer: nice project and pretty amazing set of people! Zihao Ye Tianqi Chen and everyone.

thumb_up_off_alt28

chat_bubble_outline0

repeat3

shareShare

Zhihao Jia

@jiazhihao

5 months ago

📢Exciting updates from #MLSys2025! All session recordings are now available and free to watch at mlsys.org. We’re also thrilled to announce that #MLSys2026 will be held in Seattle next May—submissions open next month with a deadline of Oct 30. We look forward to

thumb_up_off_alt101

chat_bubble_outline2

repeat30

shareShare

Tianqi Chen

@tqchenml

5 months ago

#MLSys2026 will be led by the general chair Luis Ceze and PC chairs Zhihao Jia and Aakanksha Chowdhery. The conference will be held in Bellevue on Seattle's east side. Consider submitting and bringing your latest works in AI and systems—more details at mlsys.org.

thumb_up_off_alt57

chat_bubble_outline0

repeat12

shareShare

Jeff Dean

@jeffdean

5 months ago

Mark your calendars for #MLSys2026 in May, 2026 in Seattle. Submission deadline for papers is Oct 30 this year.

thumb_up_off_alt109

chat_bubble_outline7

repeat15

shareShare

Banghua Zhu

@banghuaz

5 months ago

Excited to share that I’m joining NVIDIA as a Principal Research Scientist! We’ll be joining forces on efforts in model post-training, evaluation, agents, and building better AI infrastructure—with a strong emphasis on collaboration with developers and academia. We’re committed

thumb_up_off_alt2,2K

chat_bubble_outline145

repeat107

shareShare