Deanna Emery (@deannalemery) Twitter Tweets • TwiCopy

Julia Neagu

4 months ago

Most teams only find out their AI is broken when someone complains or churns. Your agents shouldn’t fail silently. We’re launching Quotient AI Detections: a system to catch agent mistakes, identify how they happened, and automatically fix them.

thumb_up_off_alt12

chat_bubble_outline1

repeat5

shareShare

Julia Neagu

@juliaaneagu

4 months ago

We just launched Quotient AI Detections: the first system that helps teams catch AI failures before their users do. As part of the launch, we partnered with ElevenLabs to offer coupons through the AI Engineer Pack: → 1,000,000 extra logs → 10,000 free detections → $250+

thumb_up_off_alt8

chat_bubble_outline0

repeat2

shareShare

Julia Neagu

@juliaaneagu

4 months ago

We’re teaming up with Vercel to support the next wave of AI Accelerator builders. Participants and winners get $29K in Quotient credits to monitor their AI agents for failures from day 1, plus hands-on support from our team. 👇Apps close May 17. Apply now!

thumb_up_off_alt5

chat_bubble_outline0

repeat2

shareShare

AI Engineer

@aidotengineer

4 months ago

Announcing our speakers for the Retrieval + Search track! ⚠️PSA: Tix nearly sold out, get em here: ti.to/software-3/ai-…… Featuring: Aman, Former Founder, Harvey Jerry Liu, CEO, LlamaIndex 🦙 Julia Neagu, CEO, Quotient AI changhiskhan, CEO, LanceDB

thumb_up_off_alt15

chat_bubble_outline0

repeat4

shareShare

Deanna Emery

@deannalemery

4 months ago

Can’t wait to be back at AI Engineer! I’m teaming up with Maitar Asher 🎗️ from tavily to talk about evaluating AI search. We’re sharing a practical eval framework, lessons from real-world deployments, and never-seen-before benchmark results. Hope to see you there!

thumb_up_off_alt7

chat_bubble_outline1

repeat3

shareShare

Julia Neagu

@juliaaneagu

4 months ago

HypoEval is now available in Quotient AI's OSS judges! It uses SOTA hypothesis generation with just 30 human annotations to create decomposed rubrics, enabling LLMs to score criteria clearly. Beats fine-tuned models (w/ 3x more labels). Thanks Mingxuan (Aldous) Li for contributing!

thumb_up_off_alt10

chat_bubble_outline2

repeat3

shareShare

Julia Neagu

@juliaaneagu

3 months ago

Most engineers think you need ground truth data to detect AI hallucinations. You don't. Extrinsic hallucinations are the real problem: model misusing the context you gave it. Here's a primer on how to do table stakes hallucination detection without expensive datasets👇

thumb_up_off_alt14

chat_bubble_outline1

repeat2

shareShare

Julia Neagu

@juliaaneagu

3 months ago

retrieval + search track = best vibes AI Engineer ft Maitar Asher 🎗️ Deanna Emery Jerry Liu tavily Quotient AI LlamaIndex 🦙

retrieval + search track = best vibes <a href="/aiDotEngineer/">AI Engineer</a> ft <a href="/maitarasher/">Maitar Asher 🎗️</a> <a href="/DeannaLEmery/">Deanna Emery</a> <a href="/jerryjliu0/">Jerry Liu</a> <a href="/tavilyai/">tavily</a> <a href="/QuotientAI/">Quotient AI</a> <a href="/llama_index/">LlamaIndex 🦙</a>

thumb_up_off_alt11

chat_bubble_outline1

repeat4

shareShare

Julia Neagu

@juliaaneagu

3 months ago

Just shared the slides from our AI Engineer World Fair talk: Evaluating AI Search – A Practical Framework for Augmented Systems. As more AI agents rely on real-time data (like the web!), traditional eval approaches are falling behind and don't capture what's actually

Just shared the slides from our <a href="/aiDotEngineer/">AI Engineer</a> World Fair talk: Evaluating AI Search – A Practical Framework for Augmented Systems.

As more AI agents rely on real-time data (like the web!), traditional eval approaches are falling behind and don't capture what's actually

thumb_up_off_alt32

chat_bubble_outline4

repeat7

shareShare

Julia Neagu

@juliaaneagu

3 months ago

The worst part of building an agent? You don’t know it’s broken until your users tell you. We just dropped a cookbook for a web research agent with real time monitoring — so you can catch critical issues as they happen. ft. tavily LangChain OpenAI Quotient AI

thumb_up_off_alt66

chat_bubble_outline5

repeat8

shareShare

Julia Neagu

@juliaaneagu

2 months ago

What did Freddie Vargus see? 👀 Everyone’s talking about context engineering now. Freddie knew months ago: context is the product.

thumb_up_off_alt9

chat_bubble_outline1

repeat3

shareShare

jason liu - vacation mode

@jxnlco

2 months ago

how do i catch hallucinations? come learn to implement monitoring systems that catch AI errors as they happen in live production environments with Julia Neagu and Quotient AI if you register, you'll be sent the recording and study notes after they're done!

thumb_up_off_alt9

chat_bubble_outline1

repeat3

shareShare

Julia Neagu

@juliaaneagu

2 months ago

DMs OPEN for topics you want covered. I write my talks the night before. it's a really bad habit. it stresses out Deanna Emery

DMs OPEN for topics you want covered.

I write my talks the night before.

it's a really bad habit.

it stresses out <a href="/DeannaLEmery/">Deanna Emery</a>

thumb_up_off_alt6

chat_bubble_outline1

repeat2

shareShare

Julia Neagu

@juliaaneagu

2 months ago

Just dropped: three new cookbooks for building AI research agents with Exa, LangChain, OpenAI, and Anthropic — now with built-in monitoring from Quotient AI. Track search relevance. Catch hallucinations. Debug real-world agents as they run.

thumb_up_off_alt22

chat_bubble_outline3

repeat6

shareShare

Freddie Vargus

@freddie_v4

2 months ago

today we're releasing a new small model (0.5B) for detecting problems with tool usage in agents, trained on 50M tokens from publicly available MCP server tools it's great at picking up on tool accuracy issues and outperforms larger models

thumb_up_off_alt942

chat_bubble_outline14

repeat107

shareShare

Julia Neagu

@juliaaneagu

2 months ago

WHAT A DAY: Quotient AI limbic-tool-use-0.5B is trending on Hugging Face! it’s now the top fine-tuned model for Qwen2.5-0.5B-Instruct appreciate all the support ❤️more coming soon 👀

thumb_up_off_alt23

chat_bubble_outline2

repeat5

shareShare

Freddie Vargus

@freddie_v4

a month ago

thanks to everyone for the excitement and questions. we've got more stuff cooking, and wanted to share more about the data curation & training pipeline from our researcher @deannalemery in our blog below 👇

thumb_up_off_alt16

chat_bubble_outline1

repeat6

shareShare

Freddie Vargus

@freddie_v4

a month ago

Read about how it all starts with good, realistic data: blog.quotientai.co/introducing-li… we pulled publicly available MCP server tool information from over 150 MCP servers, extracting tool definitions and metadata, and then reformatting the server info so our data was exploded into 1

thumb_up_off_alt8

chat_bubble_outline2

repeat1

shareShare

Julia Neagu

@juliaaneagu

a month ago

Agents still mess up tool calls. Last week we released limbic-tool-use-0.5B - a small model that catches those mistakes better than GPT-4.1. Now we're sharing how we did it: dataset, training pipeline, benchmark. 162 MCP server tools 50M+ tokens 1 Modal H100 read below

thumb_up_off_alt19

chat_bubble_outline2

repeat3

shareShare

Julia Neagu

@juliaaneagu

a month ago

Our talk with @Tavily is now live — part of the new AI Engineer Retrieval & Search track. We share a practical framework for evaluating AI-powered search, and results from benchmarking popular context-augmented systems. If you’re building AI search, this one’s for you.

thumb_up_off_alt18

chat_bubble_outline1

repeat7

shareShare