Woosuk Kwon (@woosuk_k) 's Twitter Profile
Woosuk Kwon

@woosuk_k

PhD student at @Berkeley_EECS building @vllm_project

ID: 1650625424198881280

linkhttp://woosuk.me calendar_today24-04-2023 22:19:23

248 Tweet

3,3K Followers

552 Following

Daniel Han (@danielhanchen) 's Twitter Profile Photo

We'll be at Ollama and vLLM's inference night next Thursday! 🦥🦙 Come meet us at @YCombinator's San Francisco office. Lots of other cool open-source projects will be there too!

Jae-Won Chung (@jaewon_chung_cs) 's Twitter Profile Photo

Cornstarch 🚀 is a training system for multimodal models like VLMs. Mix and match any modality encoder(s) and an LLM for your use case!

EmbeddedLLM (@embeddedllm) 's Twitter Profile Photo

🚨 vLLM Blog Alert! vLLM introduces PTPC-FP8 quantization on AMD ROCm, delivering near-BF16 accuracy at FP8 speeds. Run LLMs faster on AMD MI300X GPUs – no pre-quantization required! Why PTPC-FP8 rocks: - Per-Token Activation Scaling: Each token gets its own scaling

Robert Nishihara (@robertnishihara) 's Twitter Profile Photo

If you're using vLLM + Ray for batch inference or online serving, check this out. We're investing heavily in making that combination work really well.

Ahmad Al-Dahle (@ahmad_al_dahle) 's Twitter Profile Photo

Introducing our first set of Llama 4 models! We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4

Introducing our first set of Llama 4 models!

We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4
vLLM (@vllm_project) 's Twitter Profile Photo

vLLM v0.8.3 now supports AI at Meta's latest Llama 4 Scout and Maverick. We see these open source models as a major step forward in efficiency with long context feature, native multi-modality, and MoE architecture. Best tips of running it 🧵 blog.vllm.ai/2025/04/05/lla…

Agentica Project (@agentica_) 's Twitter Profile Photo

Introducing DeepCoder-14B-Preview - our fully open-sourced reasoning model reaching o1 and o3-mini level on coding and math. The best part is, we’re releasing everything: not just the model, but the dataset, code, and training recipe—so you can train it yourself!🔥 Links below:

Introducing DeepCoder-14B-Preview - our fully open-sourced reasoning model reaching o1 and o3-mini level on coding and math.

The best part is, we’re releasing everything: not just the model, but the dataset, code, and training recipe—so you can train it yourself!🔥

Links below:
Yixin Dong (@yi_xin_dong) 's Twitter Profile Photo

XGrammar is accepted to MLSys 2025🎉🎉🎉 It is a widely adopted library for structured generation with LLMs—output clean JSON, function calling, custom grammars, and more, exactly as specified. Now the default backend in MLC-LLM/SGLang/vLLM/TRT-LLM, with over 5M downloads. Check

XGrammar is accepted to MLSys 2025🎉🎉🎉
It is a widely adopted library for structured generation with LLMs—output clean JSON, function calling, custom grammars, and more, exactly as specified.
Now the default backend in MLC-LLM/SGLang/vLLM/TRT-LLM, with over 5M downloads.

Check
vLLM (@vllm_project) 's Twitter Profile Photo

🙏 DeepSeek's highly performant inference engine is built on top of vLLM. Now they are open-sourcing the engine the right way: instead of a separate repo, they are bringing changes to the open source community so everyone can immediately benefit! github.com/deepseek-ai/op…

vLLM (@vllm_project) 's Twitter Profile Photo

vLLM🤝🤗! You can now deploy any Hugging Face language model with vLLM's speed. This integration makes it possible for one consistent implementation of the model in HF for both training and inference. 🧵 blog.vllm.ai/2025/04/11/tra…

vLLM (@vllm_project) 's Twitter Profile Photo

perf update: we are continuing to see benefits with vLLM V1 engine’s highly performant design. on 8xH200, vLLM leads in throughput for DeepSeek V3/R1 models. we expect further enhancements in collaboration with DeepSeek’s inference engine open source plan.

perf update: we are continuing to see benefits with vLLM V1 engine’s highly performant design. on 8xH200, vLLM leads in throughput for <a href="/deepseek_ai/">DeepSeek</a> V3/R1 models. we expect further enhancements in collaboration with DeepSeek’s inference engine open source plan.
OpenAI Developers (@openaidevs) 's Twitter Profile Photo

Announcing the first Codex open source fund grant recipients: ⬩vLLM - inference serving engine vLLM ⬩OWASP Nettacker - automated network pentesting OWASP Nettacker ⬩Pulumi - infrastructure as code in any language @pulumicorp ⬩Dagster - cloud-native data pipelines Dagster

Junyang Lin (@justinlin610) 's Twitter Profile Photo

Thanks for the quick merge and instant support for our models! Users of vllm and Qwen, feel free to try it out to see whether everything is good for you!

Aurick Qiao (@aurickq) 's Twitter Profile Photo

Excited to share our work on Speculative Decoding Snowflake AI Research! 🚀 4x faster LLM inference for coding agents like OpenHands All Hands AI 💬 2.4x faster LLM inference for interactive chat 💻 Open-source via Arctic Inference as a plugin for vLLM 🧵

Excited to share our work on Speculative Decoding <a href="/Snowflake/">Snowflake</a> AI Research!

🚀 4x faster LLM inference for coding agents like OpenHands <a href="/allhands_ai/">All Hands AI</a>

💬 2.4x faster LLM inference for interactive chat 

💻 Open-source via Arctic Inference as a plugin for <a href="/vllm_project/">vLLM</a> 

🧵