Quotient AI (@quotientai) Twitter Tweets • TwiCopy

John Berryman

5 months ago

This is super exciting! While the early eval companies dove into meaningless metrics ("helpfulness"!), Quotient is targeting the evaluations that really matter. If your agent is lying to your customers, you need to know about it. What's more, you need to figure out why!

thumb_up_off_alt3

chat_bubble_outline0

repeat2

shareShare

AI Engineer

@aidotengineer

5 months ago

Announcing our speakers for the Retrieval + Search track! ⚠️PSA: Tix nearly sold out, get em here: ti.to/software-3/ai-…… Featuring: Aman, Former Founder, Harvey Jerry Liu, CEO, LlamaIndex 🦙 Julia Neagu, CEO, Quotient AI changhiskhan, CEO, LanceDB

thumb_up_off_alt15

chat_bubble_outline0

repeat4

shareShare

Julia Neagu

@juliaaneagu

5 months ago

We’re heading back to AI Engineer! Deanna Emery (founding AI Engineer at Quotient AI) and Maitar Asher 🎗️ (Head of Eng tavily) are speaking evaluating AI search. If you're building AI search, don't miss it.

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Deanna Emery

@deannalemery

5 months ago

Can’t wait to be back at AI Engineer! I’m teaming up with Maitar Asher 🎗️ from tavily to talk about evaluating AI search. We’re sharing a practical eval framework, lessons from real-world deployments, and never-seen-before benchmark results. Hope to see you there!

thumb_up_off_alt7

chat_bubble_outline1

repeat3

shareShare

Mingxuan (Aldous) Li

@itea1001

5 months ago

HypoEval evaluators (github.com/ChicagoHAI/Hyp…) are now incorporated into judges from Quotient AI — check it out at github.com/quotient-ai/ju…!

thumb_up_off_alt4

chat_bubble_outline0

repeat4

shareShare

Julia Neagu

@juliaaneagu

5 months ago

HypoEval is now available in Quotient AI's OSS judges! It uses SOTA hypothesis generation with just 30 human annotations to create decomposed rubrics, enabling LLMs to score criteria clearly. Beats fine-tuned models (w/ 3x more labels). Thanks Mingxuan (Aldous) Li for contributing!

thumb_up_off_alt10

chat_bubble_outline2

repeat3

shareShare

Julia Neagu

@juliaaneagu

5 months ago

detections go brrr One week in, Quotient AI Detections has processed 20M+ tokens, analyzed tens of thousands of logs, and caught thousands of hallucinations across real AI production apps. Still a long way to go, but we're committed to giving builders SOTA AI monitoring.

detections go brrr

One week in, <a href="/QuotientAI/">Quotient AI</a> Detections has processed 20M+ tokens, analyzed tens of thousands of logs, and caught thousands of hallucinations across real AI production apps.

Still a long way to go, but we're committed to giving builders SOTA AI monitoring.

thumb_up_off_alt8

chat_bubble_outline1

repeat2

shareShare

Julia Neagu

@juliaaneagu

5 months ago

Most engineers think you need ground truth data to detect AI hallucinations. You don't. Extrinsic hallucinations are the real problem: model misusing the context you gave it. Here's a primer on how to do table stakes hallucination detection without expensive datasets👇

thumb_up_off_alt14

chat_bubble_outline1

repeat2

shareShare

Julia Neagu

@juliaaneagu

5 months ago

it was a pleasure speaking at AI Engineer with Maitar Asher 🎗️ from tavily and Deanna Emery from Quotient AI 🫡

it was a pleasure speaking at <a href="/aiDotEngineer/">AI Engineer</a> with <a href="/maitarasher/">Maitar Asher 🎗️</a> from <a href="/tavilyai/">tavily</a> and <a href="/DeannaLEmery/">Deanna Emery</a> from <a href="/QuotientAI/">Quotient AI</a> 🫡

thumb_up_off_alt13

chat_bubble_outline3

repeat3

shareShare

Julia Neagu

@juliaaneagu

5 months ago

retrieval + search track = best vibes AI Engineer ft Maitar Asher 🎗️ Deanna Emery Jerry Liu tavily Quotient AI LlamaIndex 🦙

retrieval + search track = best vibes <a href="/aiDotEngineer/">AI Engineer</a> ft <a href="/maitarasher/">Maitar Asher 🎗️</a> <a href="/DeannaLEmery/">Deanna Emery</a> <a href="/jerryjliu0/">Jerry Liu</a> <a href="/tavilyai/">tavily</a> <a href="/QuotientAI/">Quotient AI</a> <a href="/llama_index/">LlamaIndex 🦙</a>

thumb_up_off_alt11

chat_bubble_outline1

repeat4

shareShare

Rohan Paul

@rohanpaul_ai

5 months ago

Today's edition (8-Jun) of my newsletter is ready. (Consider subscribing, I write it daily. Link in comments & bio and also you will get a 1300+ page Python book as soon as you subscribe). Prompting with AI scales, Verifying doesn't

thumb_up_off_alt18

chat_bubble_outline3

repeat3

shareShare

Tool Use

@tooluseai

4 months ago

Do you need evals for your AI project? Freddie Vargus joins us this week to share his experience from Quotient AI and GitHub Co-pilot

thumb_up_off_alt3

chat_bubble_outline0

repeat4

shareShare

Julia Neagu

@juliaaneagu

4 months ago

“You want your model hitting milestones, not minefields.” Most AI eval talk is hand-wavy. This isn’t. Freddie Vargus (Quotient AI CTO) gets into the weeds: how to actually test tool use, avoid minefields, and build agents that don’t break. Check out the recording👇

thumb_up_off_alt6

chat_bubble_outline0

repeat5

shareShare

Julia Neagu

@juliaaneagu

4 months ago

Just shared the slides from our AI Engineer World Fair talk: Evaluating AI Search – A Practical Framework for Augmented Systems. As more AI agents rely on real-time data (like the web!), traditional eval approaches are falling behind and don't capture what's actually

Just shared the slides from our <a href="/aiDotEngineer/">AI Engineer</a> World Fair talk: Evaluating AI Search – A Practical Framework for Augmented Systems.

As more AI agents rely on real-time data (like the web!), traditional eval approaches are falling behind and don't capture what's actually

thumb_up_off_alt32

chat_bubble_outline4

repeat7

shareShare

Julia Neagu

@juliaaneagu

4 months ago

AI Engineer Looking for more resources (think: research, OSS libraries, cookbooks and more!) for AI reliability? We have that! Check out Quotient AI Alpha, our collection of tool, resources and research. more coming weekly 👀

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

Julia Neagu

@juliaaneagu

4 months ago

The worst part of building an agent? You don’t know it’s broken until your users tell you. We just dropped a cookbook for a web research agent with real time monitoring — so you can catch critical issues as they happen. ft. tavily LangChain OpenAI Quotient AI

thumb_up_off_alt66

chat_bubble_outline5

repeat8

shareShare

Julia Neagu

@juliaaneagu

4 months ago

What did Freddie Vargus see? 👀 Everyone’s talking about context engineering now. Freddie knew months ago: context is the product.

thumb_up_off_alt9

chat_bubble_outline1

repeat3

shareShare

jason liu - vacation mode

@jxnlco

4 months ago

how do i catch hallucinations? come learn to implement monitoring systems that catch AI errors as they happen in live production environments with Julia Neagu and Quotient AI if you register, you'll be sent the recording and study notes after they're done!

thumb_up_off_alt9

chat_bubble_outline1

repeat3

shareShare

Julia Neagu

@juliaaneagu

4 months ago

If you're shipping LLMs to production and still finding out about critical from your users, this course is for you. Real-time evals, automated detection, and the tools we use at Quotient AI to keep AI grounded. On July 30th jason liu and myself are laying it all out.

thumb_up_off_alt11

chat_bubble_outline0

repeat3

shareShare

Julia Neagu

@juliaaneagu

4 months ago

DMs OPEN for topics you want covered. I write my talks the night before. it's a really bad habit. it stresses out Deanna Emery

DMs OPEN for topics you want covered.

I write my talks the night before.

it's a really bad habit.

it stresses out <a href="/DeannaLEmery/">Deanna Emery</a>

thumb_up_off_alt6

chat_bubble_outline1

repeat2

shareShare