SGLang (@sgl_project) Twitter Tweets • TwiCopy

SGLang

@sgl_project

+ Follow

SGLang project sglang.ai This is an alias account for SGLang, please follow @lmsysorg

ID: 1923440305703092226

calendar_today16-05-2025 18:08:28

7 Tweet

90 Followers

13 Following

LMSYS Org

@lmsysorg

6 months ago

🚀 Breaking: SGLang provides the first open-source implementation to serve DeepSeek V3/R1 models with large-scale expert parallelism and prefill-decode disaggregation on 96 GPUs. It nearly matches the throughput reported by the official DeepSeek blog, achieving 52.3K input

🚀 Breaking: SGLang provides the first open-source implementation to serve <a href="/deepseek_ai/">DeepSeek</a> V3/R1 models with large-scale expert parallelism and prefill-decode disaggregation on 96 GPUs.
It nearly matches the throughput reported by the official DeepSeek blog, achieving 52.3K input

thumb_up_off_alt383

chat_bubble_outline10

repeat80

shareShare

Nebius

@nebiusai

5 months ago

The LMSYS Org team in charge of SGLang, a pioneering LLM inference framework, teamed up with Nebius AI Cloud to supercharge DeepSeek R1’s performance for real-world use. Read the full story: nebius.com/customer-stori… 🔹 Goal: Maximize inference throughput for DeepSeek models

The <a href="/lmsysorg/">LMSYS Org</a> team in charge of SGLang, a pioneering LLM inference framework, teamed up with Nebius AI Cloud to supercharge <a href="/deepseek_ai/">DeepSeek</a> R1’s performance for real-world use. Read the full story: nebius.com/customer-stori…

🔹 Goal:
Maximize inference throughput for DeepSeek models

thumb_up_off_alt157

chat_bubble_outline8

repeat19

shareShare

LMSYS Org

@lmsysorg

4 months ago

The SGLang team just ran DeepSeek 671B on NVIDIA’s GB200 NVL72, unlocking 7,583 toks/sec/GPU for decoding w/ PD disaggregation + large-scale expert parallelism — 2.7× faster than H100. Don’t miss this work! 🔥 Thanks to Pen Li from NVIDIA who kicked off this collaboration and

thumb_up_off_alt106

chat_bubble_outline4

repeat23

shareShare

NVIDIA AI Developer

@nvidiaaidev

4 months ago

.LMSYS Org (SGLang) now achieves 7,583 tokens per second per GPU running DeepSeek R1 on the GB200 NVL72, a 2.7x leap over H100. We're excited to see the open source ecosystem advance inference optimizations on GB200 NVL72, driving down cost per token for the industry at

.<a href="/lmsysorg/">LMSYS Org</a> (SGLang) now achieves 7,583 tokens per second per GPU running <a href="/deepseek_ai/">DeepSeek</a> R1 on the GB200 NVL72, a 2.7x leap over H100.

We're excited to see the open source ecosystem advance inference optimizations on GB200 NVL72, driving down cost per token for the industry at

thumb_up_off_alt173

chat_bubble_outline9

repeat34

shareShare

LMSYS Org

@lmsysorg

4 months ago

We're excited to release OME, which is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs). It optimizes the deployment and operation of LLMs by automating model management, intelligent runtime selection, efficient resource

thumb_up_off_alt87

chat_bubble_outline3

repeat15

shareShare

Rajko Radovanović

@rajko_rad

4 months ago

We a16z just launched the third batch of Open Source AI Grants (cc Mike Bornstein) 🎉 This round includes projects focused on LLM evaluation, novel reasoning tests, infrastructure, and experimental research at the edge of capability and cognition: • SGLang: High-performance LLM

thumb_up_off_alt504

chat_bubble_outline33

repeat58

shareShare

Drew Houston

@drewhouston

4 months ago

zhyncs love sglang!

thumb_up_off_alt17

chat_bubble_outline3

repeat2

shareShare