Woosuk Kwon (@woosuk_k) Twitter Tweets • TwiCopy

We'll be at Ollama and vLLM's inference night next Thursday! 🦥🦙 Come meet us at @YCombinator's San Francisco office. Lots of other cool open-source projects will be there too!

thumb_up_off_alt88

chat_bubble_outline2

repeat8

shareShare

Jae-Won Chung

@jaewon_chung_cs

7 months ago

Cornstarch 🚀 is a training system for multimodal models like VLMs. Mix and match any modality encoder(s) and an LLM for your use case!

thumb_up_off_alt7

chat_bubble_outline1

repeat1

shareShare

Casper Hansen

@casper_hansen_

7 months ago

TRL now handles multi-node training with vLLM for GRPO🤯

thumb_up_off_alt238

chat_bubble_outline3

repeat23

shareShare

🚨 vLLM Blog Alert! vLLM introduces PTPC-FP8 quantization on AMD ROCm, delivering near-BF16 accuracy at FP8 speeds. Run LLMs faster on AMD MI300X GPUs – no pre-quantization required! Why PTPC-FP8 rocks: - Per-Token Activation Scaling: Each token gets its own scaling

thumb_up_off_alt47

chat_bubble_outline0

repeat13

shareShare

Robert Nishihara

@robertnishihara

7 months ago

If you're using vLLM + Ray for batch inference or online serving, check this out. We're investing heavily in making that combination work really well.

thumb_up_off_alt24

chat_bubble_outline2

repeat1

shareShare

Ahmad Al-Dahle

@ahmad_al_dahle

7 months ago

Introducing our first set of Llama 4 models! We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4

thumb_up_off_alt5,5K

chat_bubble_outline323

repeat959

shareShare

vLLM

@vllm_project

7 months ago

vLLM v0.8.3 now supports AI at Meta's latest Llama 4 Scout and Maverick. We see these open source models as a major step forward in efficiency with long context feature, native multi-modality, and MoE architecture. Best tips of running it 🧵 blog.vllm.ai/2025/04/05/lla…

thumb_up_off_alt290

chat_bubble_outline7

repeat37

shareShare

Agentica Project

@agentica_

7 months ago

Introducing DeepCoder-14B-Preview - our fully open-sourced reasoning model reaching o1 and o3-mini level on coding and math. The best part is, we’re releasing everything: not just the model, but the dataset, code, and training recipe—so you can train it yourself!🔥 Links below:

thumb_up_off_alt886

chat_bubble_outline23

repeat224

shareShare

Yixin Dong

@yi_xin_dong

7 months ago

XGrammar is accepted to MLSys 2025🎉🎉🎉 It is a widely adopted library for structured generation with LLMs—output clean JSON, function calling, custom grammars, and more, exactly as specified. Now the default backend in MLC-LLM/SGLang/vLLM/TRT-LLM, with over 5M downloads. Check

thumb_up_off_alt110

chat_bubble_outline3

repeat18

shareShare

Harry Mellor

@hmellor_

7 months ago

Another month, another open-source milestone for the vLLM 🎉 github.com/vllm-project/v… now has 1000 contributors 🚀

Another month, another open-source milestone for the <a href="/vllm_project/">vLLM</a> 🎉

github.com/vllm-project/v… now has 1000 contributors 🚀

thumb_up_off_alt43

chat_bubble_outline0

repeat4

shareShare

Woosuk Kwon

@woosuk_k

7 months ago

Huge congrats to all the Google Cloud and Red Hat AI team members who drove this effort!

thumb_up_off_alt62

chat_bubble_outline0

repeat2

shareShare

vLLM

@vllm_project

7 months ago

🙏 DeepSeek's highly performant inference engine is built on top of vLLM. Now they are open-sourcing the engine the right way: instead of a separate repo, they are bringing changes to the open source community so everyone can immediately benefit! github.com/deepseek-ai/op…

thumb_up_off_alt2,2K

chat_bubble_outline26

repeat350

shareShare

vLLM

@vllm_project

6 months ago

vLLM🤝🤗! You can now deploy any Hugging Face language model with vLLM's speed. This integration makes it possible for one consistent implementation of the model in HF for both training and inference. 🧵 blog.vllm.ai/2025/04/11/tra…

thumb_up_off_alt850

chat_bubble_outline12

repeat126

shareShare

vLLM

@vllm_project

6 months ago

perf update: we are continuing to see benefits with vLLM V1 engine’s highly performant design. on 8xH200, vLLM leads in throughput for DeepSeek V3/R1 models. we expect further enhancements in collaboration with DeepSeek’s inference engine open source plan.

perf update: we are continuing to see benefits with vLLM V1 engine’s highly performant design. on 8xH200, vLLM leads in throughput for <a href="/deepseek_ai/">DeepSeek</a> V3/R1 models. we expect further enhancements in collaboration with DeepSeek’s inference engine open source plan.

thumb_up_off_alt421

chat_bubble_outline7

repeat41

shareShare

OpenAI Developers

@openaidevs

6 months ago

Announcing the first Codex open source fund grant recipients: ⬩vLLM - inference serving engine vLLM ⬩OWASP Nettacker - automated network pentesting OWASP Nettacker ⬩Pulumi - infrastructure as code in any language @pulumicorp ⬩Dagster - cloud-native data pipelines Dagster

thumb_up_off_alt947

chat_bubble_outline38

repeat182

shareShare

Junyang Lin

@justinlin610

6 months ago

Thanks for the quick merge and instant support for our models! Users of vllm and Qwen, feel free to try it out to see whether everything is good for you!

thumb_up_off_alt322

chat_bubble_outline7

repeat12

shareShare

Aurick Qiao

@aurickq

6 months ago

Excited to share our work on Speculative Decoding Snowflake AI Research! 🚀 4x faster LLM inference for coding agents like OpenHands All Hands AI 💬 2.4x faster LLM inference for interactive chat 💻 Open-source via Arctic Inference as a plugin for vLLM 🧵

Excited to share our work on Speculative Decoding <a href="/Snowflake/">Snowflake</a> AI Research!

🚀 4x faster LLM inference for coding agents like OpenHands <a href="/allhands_ai/">All Hands AI</a>

💬 2.4x faster LLM inference for interactive chat

💻 Open-source via Arctic Inference as a plugin for <a href="/vllm_project/">vLLM</a>

🧵

thumb_up_off_alt164

chat_bubble_outline3

repeat38

shareShare

Woosuk Kwon

ollama

Woosuk Kwon

Daniel Han

Jae-Won Chung

Casper Hansen

EmbeddedLLM

Robert Nishihara

Ahmad Al-Dahle

vLLM

Agentica Project

Yixin Dong

Harry Mellor

Woosuk Kwon

vLLM

vLLM

vLLM

OpenAI Developers

Junyang Lin

Aurick Qiao