brian stevens (@addvin) 's Twitter Profile
brian stevens

@addvin

CEO, Neural Magic. Ex VP, CTO of Google Cloud and EVP, CTO of Red Hat, RPI and UNH alumn, marathoner, ironman, ADK MT 46er.

ID: 18744691

calendar_today07-01-2009 23:43:02

738 Tweet

4,4K Followers

159 Following

vLLM (@vllm_project) 's Twitter Profile Photo

⚔Llama 3.1 series are uniquely challenging due to long context and large size. We want to thank Red Hat AI (formerly Neural Magic) for their continual stewardship of the quantization code path in vLLM, Anyscale for their high quality implementation of chunked prefill and speculative decoding,

Mark Kurtz (@markurtz_) 's Twitter Profile Photo

🧵1/4 Our Llama 3.1 compression project is underway, aiming for cost-effective and sustainable deployments without compromising accuracy. The FP8 quantized Llama 3.1 8B model has already achieved over 99% recovery! šŸŽÆšŸ“‰ #LLMs #vLLM #AI #MachineLearning #Quantization

Roshan Sumbaly (@rsumbaly) 's Twitter Profile Photo

Great to see the community moving fast to adapt Llama 3.1 to their needs. This is the beauty of open-source and key part of why we're going to share more of our system-level thinking with Llama Stack. Great work vLLM and @neuralmagic folks - let's find more ways to work

Red Hat AI (@redhat_ai) 's Twitter Profile Photo

Our team has been busy releasing quantized Llama 3.1 models, thoroughly evaluated to ensure optimal performance in #vLLM. Check them out and let us know your thoughts and feedback by commenting on this post. šŸ™ AI at Meta vLLM huggingface.co/collections/ne…

Red Hat AI (@redhat_ai) 's Twitter Profile Photo

vLLM is the leading open-source inference server with 24k GitHub stars. Join us for bi-weekly vLLM Office Hours to learn about the project, get involved, and provide feedback. Here's what to expect this week: 1. Get the latest updates from Neural Magic’s Engineering Lead and top

Red Hat AI (@redhat_ai) 's Twitter Profile Photo

Sparse-Marlin is here and integrated into vLLM! This GPU-optimized kernel accelerates matrix multiplication with 4-bit quantized weights and 2:4 sparsity, achieving 5.3x speedups on NVIDIA GPUs (Ampere/Ada). Maintains efficiency with batch sizes up to 32. Links below.

Sparse-Marlin is here and integrated into <a href="/vllm_project/">vLLM</a>! This GPU-optimized kernel accelerates matrix multiplication with 4-bit quantized weights and 2:4 sparsity, achieving 5.3x speedups on NVIDIA GPUs (Ampere/Ada). Maintains efficiency with batch sizes up to 32. Links  below.
brian stevens (@addvin) 's Twitter Profile Photo

Quantization of LLM models is critical for efficient deployments. But how to avoid any negative impact of quantization on model capability? Our latest research across Llama variants will serve as a great guide.

brian stevens (@addvin) 's Twitter Profile Photo

I’m thrilled to announce that Neural Magic has signed a definitive agreement to join forces with Red Hat, Inc. At Neural Magic our vision is that the future of AI is open, and we have been on a mission to enable enterprises to capture the powerful innovation from AI, while at

I’m thrilled to announce that Neural Magic has signed a definitive agreement to join forces with Red Hat, Inc.

At Neural Magic our vision is that the future of AI is open, and we have been on a mission to enable enterprises to capture the powerful innovation from AI, while at
Scale ML (@scaleml) 's Twitter Profile Photo

For our last seminar of the year we will end with Lucas Wilkinson from @neuralmagic presenting! Machete: a cutting-edge mixed-input GEMM GPU kernel targeting NVIDIA Hopper GPUs Time: Dec 4, 3pm EST Sign up via scale-ml.org to join our mailing list for the zoom link

For our last seminar of the year we will end with Lucas Wilkinson from @neuralmagic presenting! 

Machete: a cutting-edge mixed-input GEMM GPU kernel targeting NVIDIA Hopper GPUs

Time: Dec 4, 3pm  EST Sign up via scale-ml.org to join our mailing list for the zoom link
Red Hat AI (@redhat_ai) 's Twitter Profile Photo

If you are at #NeurIPS2024 this week, stop by the Neural Magic booth #307 and talk to us about the vLLM! vLLM core committer Michael Goin will be there, ready to hear your ideas and share them with the team. The best feature requests always come from in-person chats!

Matt Hicks (@matthicksj) 's Twitter Profile Photo

At Red Hat, we believe the future of AI is open. That's why I'm incredibly excited about our acquisition of Red Hat AI (formerly Neural Magic). Together, we're furthering our commitment to our customers and the open source community to deliver on the future of AI—and that starts today.

Red Hat AI (@redhat_ai) 's Twitter Profile Photo

DeepSeek’s Open Source Week drops A LOT of exciting goodies! We’re hosting vLLM Office Hours tomorrow—learn what they are, how they integrate with vLLM, & ask questions! Date: Thursday, Thu, Feb 27 Time: 2PM ET / 11AM PT Register: neuralmagic.com/community-offi… #DeepSeek #AI

DeepSeek’s Open Source Week drops A LOT of exciting goodies! We’re hosting vLLM Office Hours tomorrow—learn what they are, how they integrate with vLLM, &amp; ask questions!

Date: Thursday, Thu, Feb 27
Time: 2PM ET / 11AM PT

Register: neuralmagic.com/community-offi… #DeepSeek #AI
NVIDIA AI Developer (@nvidiaaidev) 's Twitter Profile Photo

The llm-d project is a major step forward for the #opensource AI ecosystem, and we are proud to be one of the founding contributors, reflecting our commitment to collaboration as a catalyst for innovation in generative AI. As generative and agentic AI continue to evolve,

The llm-d project is a major step forward for the #opensource AI ecosystem, and we are proud to be one of the founding contributors, reflecting our commitment to collaboration as a catalyst for innovation in generative AI.

As generative and agentic AI continue to evolve,
Mark Collier ęŸÆē†ę€€ (@sparkycollier) 's Twitter Profile Photo

Really excited to see the emergence of llm-d brian stevens ! Inference is the biggest workload in human history and the open source tools need to keep evolving to serve it

Red Hat AI (@redhat_ai) 's Twitter Profile Photo

Thanks to the LMCache Lab team for joining forces with Red Hat on llm-d! llm-d is a new open source project for scalable, efficient distributed LLM inference with vLLM. Learn more about our collaboration here: blog.lmcache.ai/2025-05-22-red…