Aleix Conchillo Flaqué (@aconchillo) 's Twitter Profile
Aleix Conchillo Flaqué

@aconchillo

a tiny schemer on mastodon @[email protected], engineering @trydaily and @pipecat_ai core maintainer.

ID: 88207021

linkhttps://github.com/aconchillo calendar_today07-11-2009 15:32:00

1,1K Tweet

375 Followers

62 Following

anant (@anant_world) 's Twitter Profile Photo

introducing vibe editing a voice agent built into your document editor making docs feel like convos — voice as the new keyboard

kwindla (@kwindla) 's Twitter Profile Photo

We wrote down everything we've learned building voice AI agents over the past two years. Core technology choices, minimizing latency, managing multimodal context, interruption handling, turn detection, evals, state machines, guardrails, memory, async and realtime function

We wrote down everything we've learned building voice AI agents over the past two years.

Core technology choices, minimizing latency, managing multimodal context, interruption handling, turn detection, evals, state machines, guardrails, memory, async and realtime function
kwindla (@kwindla) 's Twitter Profile Photo

Vibe editing! (Vibe writing?) (Vibe essaying?) I love this demo of voice-driven interactive writing. ➡️ Gemini Multimodal Live API for the speech-to-speech interactions. ➡️ Pipecat Cloud for the voice AI infrastructure, agent orchestration, and WebRTC audio transport.

kwindla (@kwindla) 's Twitter Profile Photo

[ Hoisting this out of another thread here on X ... ] I'm having a lot of conversations lately about voice agent components and, relatedly, cost. Lots of people are new to voice AI and are exploring the options! You can build voice agents using a "full stack" platform. Vapi

kwindla (@kwindla) 's Twitter Profile Photo

Can you beat my 1-929-LLM-GAME high score? We've been exploring what you can do with speech-to-speech models. Here's a word guessing game, built with the Gemini Multimodal Live API, Vercel, and Twilio, that has a bunch of interesting features ... 🧵

kwindla (@kwindla) 's Twitter Profile Photo

Announcing: Voice AI course and online community ... swyx and I are hosting a month-long technical deep dive into Voice AI and Voice Agents. Our goals are to: ➡️ cover all the lessons we've learned over the last two years building realtime, conversational AI, ➡️host fun

Announcing: Voice AI course and online community ...

<a href="/swyx/">swyx</a> and I are hosting a month-long technical deep dive into Voice AI and Voice Agents. Our goals are to:
 ➡️ cover all the lessons we've learned over the last two years building realtime, conversational AI,
 ➡️host fun
kwindla (@kwindla) 's Twitter Profile Photo

Voice agents + MCP ... When I watched this code walk-through from Laserdisc Librarian, I thought "wait, why didn't she edit out the LLM making those mistakes at the beginning ... oh I get it, good demo!" Vanessa shows how to use multiple MCP servers via the new `MCPClient` class in

Aleix Conchillo Flaqué (@aconchillo) 's Twitter Profile Photo

Did you know that the "cat" in Pipecat doesn't actually refer to a cat? I think it's a very easy one... but does anyone know what it could be referring to? Pipecat AI

kwindla (@kwindla) 's Twitter Profile Photo

Looking forward to doing a workshop at AI Engineer World's Fair on Tuesday: Building Voice Agents with Gemini and Pipecat. 10:40am in the workshop salon. Shrestha Basu Mallick and Philipp Schmid from Google, and Mark Backman and Aleix Conchillo Flaqué who work on Pipecat, will be there with voice

Looking forward to doing a workshop at <a href="/aiDotEngineer/">AI Engineer</a>  World's Fair on Tuesday: Building Voice Agents with Gemini and Pipecat.

10:40am in the workshop salon. <a href="/shresbm/">Shrestha Basu Mallick</a> and <a href="/_philschmid/">Philipp Schmid</a> from Google, and <a href="/mark_backman/">Mark Backman</a> and <a href="/aconchillo/">Aleix Conchillo Flaqué</a> who work on Pipecat, will be there with voice
kwindla (@kwindla) 's Twitter Profile Photo

Full house for the Gemini x Pipecat hands-on workshop at AI Engineer World’s Fair. Link to repo Mark created as a starter kit in 🧵

Full house for the Gemini x Pipecat hands-on workshop at AI Engineer World’s Fair. Link to repo Mark created as a starter kit in 🧵
kwindla (@kwindla) 's Twitter Profile Photo

This is my periodic appreciation post about the amazing Krisp noise reduction models. Voice AI working perfectly in a very noisy environment. You can use the Krisp models free in Daily’s voice ai hosting platform, Pipecat Cloud.

kwindla (@kwindla) 's Twitter Profile Photo

Talk to Cartesia speech-to-text about Cartesia speech-to-text. Cartesia launched a streaming STT model today, called Ink-Whisper, that's optimized for realtime voice AI. Pipecat AI has launch-day support for this new model, so I figured I'd talk to the model about itself.

Aleix Conchillo Flaqué (@aconchillo) 's Twitter Profile Photo

During college years my friends and I started a demoscene group (Anaconda). Our second demo was called The Requiem (youtube.com/watch?v=eQLp4V…). The other day I woke up with a surprise on our group chat, an AI glitched version (sound on)... goosebumps. Et trobem a faltar chochiwig!

kwindla (@kwindla) 's Twitter Profile Photo

Smart Turn v2: open source, native audio turn detection in 14 languages. New checkpoint of the open source, open data, open training code, semantic VAD model on Hugging Face, fal, and Pipecat AI. - 3x faster inference (12ms on an L40) - 14 languages (13 more than v1, which

kwindla (@kwindla) 's Twitter Profile Photo

You don't need a WebRTC server for voice agents. If you're deploying your own voice AI infrastructure, you should almost certainly be using the new(†) serverless WebRTC approach. Serverless is much simpler, which translates to faster development, better scaling, and higher

You don't need a WebRTC server for voice agents. 

If you're deploying your own voice AI infrastructure, you should almost certainly be using the new(†) serverless WebRTC approach.

Serverless is much simpler, which translates to faster development, better scaling, and higher