vLLM (@vllm_project) 's Twitter Profile
vLLM

@vllm_project

A high-throughput and memory-efficient inference and serving engine for LLMs. Join slack.vllm.ai to discuss together with the community!

ID: 1774187564276289536

linkhttps://github.com/vllm-project/vllm calendar_today30-03-2024 21:31:01

327 Tweet

12,12K Followers

15 Following

ray (@raydistributed) 's Twitter Profile Photo

🚨Meetup Alert🚨 Scaling LLM inference? Join us June 10 in SF for a Ray Meetup with real-world wins from Pinterest + Anyscale We'll discuss: ⚡ 300x throughput w/ Ray at Pinterest ⚡ DeepSeek + vLLM prod deployment ⚡ Ray Serve + Data LLM preview 📍 Anyscale HQ |

🚨Meetup Alert🚨 

Scaling LLM inference? Join us June 10 in SF for a Ray Meetup with real-world wins from <a href="/Pinterest/">Pinterest</a> + <a href="/anyscalecompute/">Anyscale</a> 

We'll discuss: 
⚡ 300x throughput w/ Ray at Pinterest
⚡ DeepSeek + vLLM prod deployment
⚡ Ray Serve + Data LLM preview

📍 Anyscale HQ |
Aurick Qiao (@aurickq) 's Twitter Profile Photo

Excited to open-source Shift Parallelism, developed at Snowflake AI Research for LLM inference! With it, Arctic Inference + vLLM delivers: 🚀3.4x faster e2e latency & 1.06x higher throughput 🚀1.7x faster generation & 2.25x lower response time 🚀16x higher throughput

Excited to open-source Shift Parallelism, developed at <a href="/Snowflake/">Snowflake</a> AI Research for LLM inference!

With it, Arctic Inference + <a href="/vllm_project/">vLLM</a> delivers:

🚀3.4x faster e2e latency &amp; 1.06x higher throughput
🚀1.7x faster generation &amp; 2.25x lower response time
🚀16x higher throughput
EmbeddedLLM (@embeddedllm) 's Twitter Profile Photo

vLLM 0.9.0 is HERE, unleashing HUGE performance on AMD GPUs! MI-Series: FP8 KV cache, +19.4% with AITER GEMM, +16.8% Qwen3 MoE (MI300X), +13.8% DeepSeek V3/R1! Consumer: 📈 RX 9000 Series: +16.4% throughput 🚀 RX 7000 Series: +19.0% performance gains AI at AMD AMD Radeon

vLLM 0.9.0 is HERE, unleashing HUGE performance on <a href="/AMD/">AMD</a> GPUs! 
MI-Series: FP8 KV cache, +19.4% with AITER GEMM, +16.8% Qwen3 MoE (MI300X), +13.8% DeepSeek V3/R1! 
Consumer: 📈 RX 9000 Series: +16.4% throughput
🚀 RX 7000 Series: +19.0% performance gains
<a href="/AIatAMD/">AI at AMD</a> <a href="/AMDRadeon/">AMD Radeon</a>
vLLM (@vllm_project) 's Twitter Profile Photo

🚀 Join us at the SF AIBrix & vLLM Meetup on June 18th at AWS SF GenAI Loft! Learn from experts at ByteDance, AWS Neuron, and EKS. Discover AIBrix: a scalable, cost-effective control plane for vLLM. Talks, Q&A, pizza, and networking! 🍕🤝 lu.ma/ab2id296

Red Hat AI (@redhat_ai) 's Twitter Profile Photo

.vLLM v0.9.0 was a BIG release 🎉 📝 649 commits 👥 143 contributors 👏 82 first-time contributors Huge thanks to everyone who made it happen! Michael Goin breaks down what’s new in vLLM v0.9.0 ⬇️

Red Hat AI (@redhat_ai) 's Twitter Profile Photo

🇯🇵 Join us for an in-person vLLM meetup on Monday, June 16 in Tokyo. Or tune in via live stream! Agenda: -Intro to vLLM -Japanese LLM adoption -Model optimization w/ LLM Compressor -Distributed inference w/ llm-d -Q&A and lightning talks RSVP: ossbyredhat.connpass.com/event/357695/

🇯🇵 Join us for an in-person <a href="/vllm_project/">vLLM</a> meetup on Monday, June 16 in Tokyo. Or tune in via live stream!

Agenda:
-Intro to vLLM
-Japanese LLM adoption
-Model optimization w/ LLM Compressor
-Distributed inference w/ llm-d
-Q&amp;A and lightning talks

RSVP: ossbyredhat.connpass.com/event/357695/
vLLM (@vllm_project) 's Twitter Profile Photo

Congrats on the launch! vLLM is proud to support the new Qwen3 embedding models, check it out 👉🏻 github.com/QwenLM/Qwen3-E…

vLLM (@vllm_project) 's Twitter Profile Photo

Thanks for the great investigation! vLLM values usability, performance, and building the ecosystem for LLM inference, together let's make open-source better❤️ Stay tuned for latest updates from vLLM!

ray (@raydistributed) 's Twitter Profile Photo

Our next Ray Meetup is almost here! June 10th 👇 Hear how Pinterest scaled inference 300x with Ray, watch a DeepSeek + vLLM live demo, and get a first look at new Ray Serve tools. Plus: networking, snacks, and great convos with fellow builders. Save you seat:

Our next Ray Meetup is almost here! June 10th 👇

Hear how <a href="/Pinterest/">Pinterest</a>  scaled inference 300x with Ray, watch a DeepSeek + <a href="/vllm_project/">vLLM</a> live demo, and get a first look at new Ray Serve tools.

Plus: networking, snacks, and great convos with fellow builders.

Save you seat:
vLLM (@vllm_project) 's Twitter Profile Photo

⬆️ uv pip install -U vllm --extra-index-url wheels.vllm.ai/0.9.1rc1 --torch-backend=auto Try out Magistral on with vLLM 0.9.1rc1 today! 🔮

vLLM (@vllm_project) 's Twitter Profile Photo

👀 Look what just arrived at UC Berkeley Sky! 🌟 A shiny MI355X system. Huge thanks to AMD for supporting open source and we are looking forward to getting it set up in the next few days!

👀 Look what just arrived at <a href="/BerkeleySky/">UC Berkeley Sky</a>! 🌟 A shiny MI355X system. Huge thanks to <a href="/AMD/">AMD</a> for supporting open source and we are looking forward to getting it set up in the next few days!
Anush Elangovan (@anushelangovan) 's Twitter Profile Photo

Glad to support the UC Berkeley Sky and the vLLM community. Day-0 Support means you get hardware on Day -2 😀. Looking forward to what the community builds and accelerating AI adoption.

Robert Nishihara (@robertnishihara) 's Twitter Profile Photo

This table was a footnote at the end of the blog, but it's actually one of the most interesting points. There is an emerging stack for post-training. anyscale.com/blog/ai-comput…

This table was a footnote at the end of the blog, but it's actually one of the most interesting points. There is an emerging stack for post-training.

anyscale.com/blog/ai-comput…