SGLang (@sgl_project) 's Twitter Profile
SGLang

@sgl_project

SGLang project sglang.ai This is an alias account for SGLang, please follow @lmsysorg

ID: 1923440305703092226

calendar_today16-05-2025 18:08:28

7 Tweet

90 Followers

13 Following

LMSYS Org (@lmsysorg) 's Twitter Profile Photo

🚀 Breaking: SGLang provides the first open-source implementation to serve DeepSeek V3/R1 models with large-scale expert parallelism and prefill-decode disaggregation on 96 GPUs. It nearly matches the throughput reported by the official DeepSeek blog, achieving 52.3K input

🚀 Breaking: SGLang provides the first open-source implementation to serve <a href="/deepseek_ai/">DeepSeek</a> V3/R1 models with large-scale expert parallelism and prefill-decode disaggregation on 96 GPUs.
It nearly matches the throughput reported by the official DeepSeek blog, achieving 52.3K input
Nebius (@nebiusai) 's Twitter Profile Photo

The LMSYS Org team in charge of SGLang, a pioneering LLM inference framework, teamed up with Nebius AI Cloud to supercharge DeepSeek R1’s performance for real-world use. Read the full story: nebius.com/customer-stori… 🔹 Goal: Maximize inference throughput for DeepSeek models

The <a href="/lmsysorg/">LMSYS Org</a> team in charge of SGLang, a pioneering LLM inference framework, teamed up with Nebius AI Cloud to supercharge <a href="/deepseek_ai/">DeepSeek</a> R1’s performance for real-world use. Read the full story: nebius.com/customer-stori…

🔹 Goal:
Maximize inference throughput for DeepSeek models
LMSYS Org (@lmsysorg) 's Twitter Profile Photo

The SGLang team just ran DeepSeek 671B on NVIDIA’s GB200 NVL72, unlocking 7,583 toks/sec/GPU for decoding w/ PD disaggregation + large-scale expert parallelism — 2.7× faster than H100. Don’t miss this work! 🔥 Thanks to Pen Li from NVIDIA who kicked off this collaboration and

The SGLang team just ran DeepSeek 671B on NVIDIA’s GB200 NVL72, unlocking 7,583 toks/sec/GPU for decoding w/ PD disaggregation + large-scale expert parallelism — 2.7× faster than H100. Don’t miss this work! 🔥

Thanks to Pen Li from NVIDIA who kicked off this collaboration and
NVIDIA AI Developer (@nvidiaaidev) 's Twitter Profile Photo

.LMSYS Org (SGLang) now achieves 7,583 tokens per second per GPU running DeepSeek R1 on the GB200 NVL72, a 2.7x leap over H100. We're excited to see the open source ecosystem advance inference optimizations on GB200 NVL72, driving down cost per token for the industry at

.<a href="/lmsysorg/">LMSYS Org</a> (SGLang) now achieves 7,583 tokens per second per GPU running <a href="/deepseek_ai/">DeepSeek</a> R1 on the GB200 NVL72, a 2.7x leap over H100.

We're excited to see the open source ecosystem advance inference optimizations on GB200 NVL72, driving down cost per token for the industry at
LMSYS Org (@lmsysorg) 's Twitter Profile Photo

We're excited to release OME, which is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs). It optimizes the deployment and operation of LLMs by automating model management, intelligent runtime selection, efficient resource

We're excited to release OME, which is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs). It optimizes the deployment and operation of LLMs by automating model management, intelligent runtime selection, efficient resource
Rajko Radovanović (@rajko_rad) 's Twitter Profile Photo

We a16z just launched the third batch of Open Source AI Grants (cc Mike Bornstein) 🎉 This round includes projects focused on LLM evaluation, novel reasoning tests, infrastructure, and experimental research at the edge of capability and cognition: • SGLang: High-performance LLM