Philip Kiely (@philip_kiely) Twitter Tweets • TwiCopy

Philip Kiely

@philip_kiely

+ Follow

DevRel @basetenco | Not an LLM (yet)

Author: wfsd.com & lifechangingemail.com

ID: 1070356464873758720

linkhttp://philipkiely.com calendar_today05-12-2018 16:37:22

1,1K Tweet

2,2K Followers

335 Following

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

We have day 0 support for #Qwen3 by Alibaba Qwen on Baseten using SGLang. Qwen 3 235B's architecture benefits from both Tensor Parallelism and Expert Parallelism to run Attention and Sparse MoE efficiently across 4 or 8 H100 GPUs depending on quantization. More in 🧵

thumb_up_off_alt50

chat_bubble_outline4

repeat12

shareShare

Philip Kiely

@philip_kiely

3 months ago

There's a lot to be excited about with Qwen 3: - Fits on 4xH100 - 1/4 the cost of DeepSeek-R1 in production - "Hybrid thinking" makes reasoning optional - Continues the Qwen tradition of being great at coding Deployment w/ SGLang + vibe check in the video!

thumb_up_off_alt76

chat_bubble_outline0

repeat13

shareShare

Baseten

@basetenco

3 months ago

Early benchmarks of Qwen 3 with SGLang show promising initial results and key avenues for improvement. We're seeing: - Up to 76 TPS per user for real-time - Up to 4600 total token throughput for batch - 32 concurrent requests as a good balance for prod Details in 🧵

thumb_up_off_alt8

chat_bubble_outline1

repeat1

shareShare

Elias

@eliasfiz

3 months ago

People told us they want Orpheus TTS in production. So we partnered with Baseten as our preferred inference provider! Baseten runs Orpheus with: •⁠ ⁠Low latency (<200 ms TTFB) •⁠ ⁠High throughput (up to 48 real-time streams per H100) •⁠ ⁠Secure, worldwide infra

People told us they want Orpheus TTS in production.

So we partnered with <a href="/basetenco/">Baseten</a> as our preferred inference provider!

Baseten runs Orpheus with:

•⁠ ⁠Low latency (<200 ms TTFB)
•⁠ ⁠High throughput (up to 48 real-time streams per H100)
•⁠ ⁠Secure, worldwide infra

thumb_up_off_alt165

chat_bubble_outline16

repeat14

shareShare

zhyncs

@zhyncs42

3 months ago

I’ll be joining my Baseten colleague Philip Kiely at the AI Engineer World’s Fair AI Engineer in San Francisco, June 3–5, to Introduce LLM serving with SGLang LMSYS Org. We’d love for you to stop by and exchange ideas in person!🤗

I’ll be joining my <a href="/basetenco/">Baseten</a> colleague <a href="/philip_kiely/">Philip Kiely</a> at the AI Engineer World’s Fair <a href="/aiDotEngineer/">AI Engineer</a> in San Francisco, June 3–5, to Introduce LLM serving with SGLang <a href="/lmsysorg/">LMSYS Org</a>. We’d love for you to stop by and exchange ideas in person!🤗

thumb_up_off_alt42

chat_bubble_outline2

repeat6

shareShare

Philip Kiely

@philip_kiely

2 months ago

My two favorite log lines.

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Philip Kiely

@philip_kiely

2 months ago

Great chatting all things voice agents with kwindla today in his course! Main takeaway: infra problems > GPU problems for voice. 1. Network overhead between client & each model 2. Client code (streaming/websockets/sessions) 3. But STT/TTS optimization w/ TRT-LLM matters too

Great chatting all things voice agents with <a href="/kwindla/">kwindla</a> today in his course!

Main takeaway: infra problems > GPU problems for voice.

1. Network overhead between client & each model
2. Client code (streaming/websockets/sessions)
3. But STT/TTS optimization w/ TRT-LLM matters too

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Philip Kiely

@philip_kiely

2 months ago

It's now 10x easier to get started building with open source models.

thumb_up_off_alt24

chat_bubble_outline0

repeat2

shareShare

Baseten

@basetenco

2 months ago

let there be inference

thumb_up_off_alt275

chat_bubble_outline48

repeat51

shareShare

AI Engineer

@aidotengineer

2 months ago

Announcing our speakers for the Voice track! ⚠️PSA: Tix nearly sold out, get em here: ti.to/software-3/ai-…… Featuring: kwindla, CEO Daily Sean DuBois, WebRTC and Realtime API OpenAI Brooke Hopkins, Founder coval @dkundel, Developer Experience OpenAI