Amir Haghighat (@amiruci) 's Twitter Profile
Amir Haghighat

@amiruci

Co-founder @basetenco

ID: 37691978

linkhttps://baseten.co calendar_today04-05-2009 16:11:58

602 Tweet

1,1K Followers

810 Following

Baseten (@basetenco) 's Twitter Profile Photo

We're excited to announce our partnership with NVIDIA to provide inference for NVIDIA NIM models on dedicated endpoints, including the new Llama 3.3 Nemotron Super 49B! You can get a scalable, dedicated endpoint for NIM models like Llama 3.3 Nemotron now in a few clicks.

Baseten (@basetenco) 's Twitter Profile Photo

🚀 We’re thrilled to introduce Baseten Embeddings Inference (BEI), the most performant embeddings solution available! 🚀 BEI is optimized specifically for embeddings workloads, which often receive high numbers of requests and require low latency for individual queries. Across

Baseten (@basetenco) 's Twitter Profile Photo

Llama 4 is here! 🦙🚀 Scout | 109B Parameters | 10M Context Maverick | 400B Parameters | 1M Context Llama 4 models are natively multimodal, use a MoE architecture, and set a new frontier for performance/cost. We're excited to offer dedicated deployments of Llama 4!

Llama 4 is here! 🦙🚀

   Scout    | 109B Parameters  | 10M Context
Maverick | 400B Parameters | 1M Context

Llama 4 models are natively multimodal, use a MoE architecture, and set a new frontier for performance/cost.

We're excited to offer dedicated deployments of Llama 4!
Baseten (@basetenco) 's Twitter Profile Photo

New bots for Llama 4 Scout and Maverick are now live on Poe! Get started with an 8M token context window for Scout (yes, you read that right) and 1M for Maverick. We're thrilled to power the fastest open-source models for Quora—more to come!

Baseten (@basetenco) 's Twitter Profile Photo

🚀 You can now use NVIDIA B200s on Baseten and get higher throughput, lower latency, and better cost per token! 🚀 From benchmarks on models like DeepSeek R1, Llama 4, and Qwen, we’re already seeing: • 5x higher throughput • Over 2x better cost per token • 38% lower latency

🚀 You can now use NVIDIA B200s on Baseten and get higher throughput, lower latency, and better cost per token! 🚀

From benchmarks on models like DeepSeek R1, Llama 4, and Qwen, we’re already seeing:

• 5x higher throughput
• Over 2x better cost per token
• 38% lower latency
Baseten (@basetenco) 's Twitter Profile Photo

We have day 0 support for #Qwen3 by Alibaba Qwen on Baseten using SGLang. Qwen 3 235B's architecture benefits from both Tensor Parallelism and Expert Parallelism to run Attention and Sparse MoE efficiently across 4 or 8 H100 GPUs depending on quantization. More in 🧵

We have day 0 support for #Qwen3 by Alibaba Qwen on Baseten using SGLang.

Qwen 3 235B's architecture benefits from both Tensor Parallelism and Expert Parallelism to run Attention and Sparse MoE efficiently across 4 or 8 H100 GPUs depending on quantization.  

More in 🧵
Philip Kiely (@philip_kiely) 's Twitter Profile Photo

There's a lot to be excited about with Qwen 3: - Fits on 4xH100 - 1/4 the cost of DeepSeek-R1 in production - "Hybrid thinking" makes reasoning optional - Continues the Qwen tradition of being great at coding Deployment w/ SGLang + vibe check in the video!

Elias (@eliasfiz) 's Twitter Profile Photo

People told us they want Orpheus TTS in production. So we partnered with Baseten as our preferred inference provider! Baseten runs Orpheus with: •⁠ ⁠Low latency (<200 ms TTFB) •⁠ ⁠High throughput (up to 48 real-time streams per H100) •⁠ ⁠Secure, worldwide infra

People told us they want Orpheus TTS in production.

So we partnered with <a href="/basetenco/">Baseten</a> as our preferred inference provider!

Baseten runs Orpheus with:

•⁠  ⁠Low latency (&lt;200 ms TTFB)
•⁠  ⁠High throughput (up to 48 real-time streams per H100)
•⁠  ⁠Secure, worldwide infra
zhyncs (@zhyncs42) 's Twitter Profile Photo

I’ll be joining my Baseten colleague Philip Kiely at the AI Engineer World’s Fair AI Engineer in San Francisco, June 3–5, to Introduce LLM serving with SGLang LMSYS Org. We’d love for you to stop by and exchange ideas in person!🤗

I’ll be joining my <a href="/basetenco/">Baseten</a> colleague <a href="/philip_kiely/">Philip Kiely</a> at the AI Engineer World’s Fair <a href="/aiDotEngineer/">AI Engineer</a> in San Francisco, June 3–5, to Introduce LLM serving with SGLang <a href="/lmsysorg/">LMSYS Org</a>. We’d love for you to stop by and exchange ideas in person!🤗
Baseten (@basetenco) 's Twitter Profile Photo

Congrats to our friends at Patronus AI on the new AI agent launch, Percival! Percival can fix other agents across 20+ common failure modes, a very necessary tool in the growing agent landscape. Check it out.

Amir Haghighat (@amiruci) 's Twitter Profile Photo

Product launch with the backstory: Internally we had always said let's do *1 thing* but do it well. For us that was inference. And we said at some point we'll earn the rights to expand the surface area beyond that. That some point is today. The vast majority of our revenue

Baseten (@basetenco) 's Twitter Profile Photo

New DeepSeek just dropped. Proud to serve the fastest DeepSeek R1 0528 inference on OpenRouter (#1 on TTFT and TPS) with our Model APIs.

New DeepSeek just dropped.

Proud to serve the fastest DeepSeek R1 0528 inference on OpenRouter (#1 on TTFT and TPS) with our Model APIs.
Greg Schoeninger (@gregschoeninger) 's Twitter Profile Photo

The biggest addition to the Oxen.ai toolbox is our new zero-code fine-tuning offering. Simply select the dataset you want to fine-tune on and let 🐂 do the grunt work. At the end of the fine-tune we give you access to the raw model weights and the ability to

The biggest addition to the Oxen.ai toolbox is our new zero-code fine-tuning offering. Simply select the dataset you want to fine-tune on and let 🐂 do the grunt work. 

At the end of the fine-tune we give you access to the raw model weights and the ability to
Google Cloud (@googlecloud) 's Twitter Profile Photo

AI inference matters. Baseten's revolutionary AI infrastructure platform, built on Google Cloud, optimizes processing even for massive models, gets your AI products to market 50% faster, and slashes costs with 90% savings compared to endpoint vendors ↓

Baseten (@basetenco) 's Twitter Profile Photo

Our customers run AI products where every millisecond and request matter. Over the years, we found fundamental limitations in traditional deployment approaches — single points of failure, regional and cloud-specific capacity constraints, and the operational headache of managing