Amir Haghighat (@amiruci) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

We're excited to announce our partnership with NVIDIA to provide inference for NVIDIA NIM models on dedicated endpoints, including the new Llama 3.3 Nemotron Super 49B! You can get a scalable, dedicated endpoint for NIM models like Llama 3.3 Nemotron now in a few clicks.

thumb_up_off_alt27

chat_bubble_outline4

repeat9

shareShare

Baseten

@basetenco

4 months ago

🚀 We’re thrilled to introduce Baseten Embeddings Inference (BEI), the most performant embeddings solution available! 🚀 BEI is optimized specifically for embeddings workloads, which often receive high numbers of requests and require low latency for individual queries. Across

thumb_up_off_alt23

chat_bubble_outline3

repeat8

shareShare

Baseten

@basetenco

4 months ago

Llama 4 is here! 🦙🚀 Scout | 109B Parameters | 10M Context Maverick | 400B Parameters | 1M Context Llama 4 models are natively multimodal, use a MoE architecture, and set a new frontier for performance/cost. We're excited to offer dedicated deployments of Llama 4!

thumb_up_off_alt27

chat_bubble_outline6

repeat9

shareShare

Baseten

@basetenco

4 months ago

New bots for Llama 4 Scout and Maverick are now live on Poe! Get started with an 8M token context window for Scout (yes, you read that right) and 1M for Maverick. We're thrilled to power the fastest open-source models for Quora—more to come!

thumb_up_off_alt14

chat_bubble_outline1

repeat4

shareShare

Baseten

@basetenco

4 months ago

🚀 You can now use NVIDIA B200s on Baseten and get higher throughput, lower latency, and better cost per token! 🚀 From benchmarks on models like DeepSeek R1, Llama 4, and Qwen, we’re already seeing: • 5x higher throughput • Over 2x better cost per token • 38% lower latency

thumb_up_off_alt17

chat_bubble_outline9

repeat6

shareShare

Baseten

@basetenco

3 months ago

We have day 0 support for #Qwen3 by Alibaba Qwen on Baseten using SGLang. Qwen 3 235B's architecture benefits from both Tensor Parallelism and Expert Parallelism to run Attention and Sparse MoE efficiently across 4 or 8 H100 GPUs depending on quantization. More in 🧵

thumb_up_off_alt50

chat_bubble_outline4

repeat12

shareShare

Philip Kiely

@philip_kiely

3 months ago

There's a lot to be excited about with Qwen 3: - Fits on 4xH100 - 1/4 the cost of DeepSeek-R1 in production - "Hybrid thinking" makes reasoning optional - Continues the Qwen tradition of being great at coding Deployment w/ SGLang + vibe check in the video!

thumb_up_off_alt76

chat_bubble_outline0

repeat13

shareShare

Elias

@eliasfiz

3 months ago

People told us they want Orpheus TTS in production. So we partnered with Baseten as our preferred inference provider! Baseten runs Orpheus with: •⁠ ⁠Low latency (<200 ms TTFB) •⁠ ⁠High throughput (up to 48 real-time streams per H100) •⁠ ⁠Secure, worldwide infra

People told us they want Orpheus TTS in production.

So we partnered with <a href="/basetenco/">Baseten</a> as our preferred inference provider!

Baseten runs Orpheus with:

•⁠ ⁠Low latency (<200 ms TTFB)
•⁠ ⁠High throughput (up to 48 real-time streams per H100)
•⁠ ⁠Secure, worldwide infra

thumb_up_off_alt165

chat_bubble_outline16

repeat14

shareShare

zhyncs

@zhyncs42

3 months ago

I’ll be joining my Baseten colleague Philip Kiely at the AI Engineer World’s Fair AI Engineer in San Francisco, June 3–5, to Introduce LLM serving with SGLang LMSYS Org. We’d love for you to stop by and exchange ideas in person!🤗

I’ll be joining my <a href="/basetenco/">Baseten</a> colleague <a href="/philip_kiely/">Philip Kiely</a> at the AI Engineer World’s Fair <a href="/aiDotEngineer/">AI Engineer</a> in San Francisco, June 3–5, to Introduce LLM serving with SGLang <a href="/lmsysorg/">LMSYS Org</a>. We’d love for you to stop by and exchange ideas in person!🤗

thumb_up_off_alt42

chat_bubble_outline2

repeat6

shareShare

Baseten

@basetenco

3 months ago

Congrats to our friends at Patronus AI on the new AI agent launch, Percival! Percival can fix other agents across 20+ common failure modes, a very necessary tool in the growing agent landscape. Check it out.

thumb_up_off_alt14

chat_bubble_outline0

repeat2

shareShare

Amir Haghighat

@amiruci

2 months ago

Product launch with the backstory: Internally we had always said let's do *1 thing* but do it well. For us that was inference. And we said at some point we'll earn the rights to expand the surface area beyond that. That some point is today. The vast majority of our revenue

thumb_up_off_alt33

chat_bubble_outline3

repeat7

shareShare

Baseten

@basetenco

2 months ago

New DeepSeek just dropped. Proud to serve the fastest DeepSeek R1 0528 inference on OpenRouter (#1 on TTFT and TPS) with our Model APIs.

thumb_up_off_alt21

chat_bubble_outline4

repeat9

shareShare

Greg Schoeninger

@gregschoeninger

2 months ago

The biggest addition to the Oxen.ai toolbox is our new zero-code fine-tuning offering. Simply select the dataset you want to fine-tune on and let 🐂 do the grunt work. At the end of the fine-tune we give you access to the raw model weights and the ability to

thumb_up_off_alt10

chat_bubble_outline1

repeat2

shareShare

Google Cloud

@googlecloud

2 months ago

AI inference matters. Baseten's revolutionary AI infrastructure platform, built on Google Cloud, optimizes processing even for massive models, gets your AI products to market 50% faster, and slashes costs with 90% savings compared to endpoint vendors ↓

thumb_up_off_alt130

chat_bubble_outline4

repeat32

shareShare

Baseten

@basetenco

2 months ago

Our customers run AI products where every millisecond and request matter. Over the years, we found fundamental limitations in traditional deployment approaches — single points of failure, regional and cloud-specific capacity constraints, and the operational headache of managing

thumb_up_off_alt21

chat_bubble_outline2

repeat6

shareShare

Amir Haghighat

@amiruci

2 months ago

And this tweet is the reason why:

thumb_up_off_alt44

chat_bubble_outline3

repeat9

shareShare