Quotient AI (@quotientai) 's Twitter Profile
Quotient AI

@quotientai

Weโ€™re building an advanced AI development and evaluation platform. Join our test kitchen: discord.gg/YeJzANpntv

ID: 1679732770367578115

linkhttp://www.quotientai.co calendar_today14-07-2023 06:01:36

68 Tweet

211 Followers

6 Following

John Berryman (@jnbrymn) 's Twitter Profile Photo

This is super exciting! While the early eval companies dove into meaningless metrics ("helpfulness"!), Quotient is targeting the evaluations that really matter. If your agent is lying to your customers, you need to know about it. What's more, you need to figure out why!

AI Engineer (@aidotengineer) 's Twitter Profile Photo

Announcing our speakers for the Retrieval + Search track! โš ๏ธPSA: Tix nearly sold out, get em here: ti.to/software-3/ai-โ€ฆโ€ฆ Featuring: Aman, Former Founder, Harvey Jerry Liu, CEO, LlamaIndex ๐Ÿฆ™ Julia Neagu, CEO, Quotient AI changhiskhan, CEO, LanceDB

Deanna Emery (@deannalemery) 's Twitter Profile Photo

Canโ€™t wait to be back at AI Engineer! Iโ€™m teaming up with Maitar Asher ๐ŸŽ—๏ธ from tavily to talk about evaluating AI search. Weโ€™re sharing a practical eval framework, lessons from real-world deployments, and never-seen-before benchmark results. Hope to see you there!

Mingxuan (Aldous) Li (@itea1001) 's Twitter Profile Photo

HypoEval evaluators (github.com/ChicagoHAI/Hypโ€ฆ) are now incorporated into judges from Quotient AI โ€” check it out at github.com/quotient-ai/juโ€ฆ!

Julia Neagu (@juliaaneagu) 's Twitter Profile Photo

HypoEval is now available in Quotient AI's OSS judges! It uses SOTA hypothesis generation with just 30 human annotations to create decomposed rubrics, enabling LLMs to score criteria clearly. Beats fine-tuned models (w/ 3x more labels). Thanks Mingxuan (Aldous) Li for contributing!

Julia Neagu (@juliaaneagu) 's Twitter Profile Photo

detections go brrr One week in, Quotient AI Detections has processed 20M+ tokens, analyzed tens of thousands of logs, and caught thousands of hallucinations across real AI production apps. Still a long way to go, but we're committed to giving builders SOTA AI monitoring.

detections go brrr   

One week in, <a href="/QuotientAI/">Quotient AI</a> Detections has processed 20M+ tokens, analyzed tens of thousands of logs, and caught thousands of hallucinations across real AI production apps.

Still a long way to go, but we're committed to giving builders SOTA AI monitoring.
Julia Neagu (@juliaaneagu) 's Twitter Profile Photo

Most engineers think you need ground truth data to detect AI hallucinations. You don't. Extrinsic hallucinations are the real problem: model misusing the context you gave it. Here's a primer on how to do table stakes hallucination detection without expensive datasets๐Ÿ‘‡

Most engineers think you need ground truth data to detect AI hallucinations. You don't.

Extrinsic hallucinations are the real problem: model misusing the context you gave it.

Here's a primer on how to do table stakes hallucination detection without expensive datasets๐Ÿ‘‡
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Today's edition (8-Jun) of my newsletter is ready. (Consider subscribing, I write it daily. Link in comments & bio and also you will get a 1300+ page Python book as soon as you subscribe). Prompting with AI scales, Verifying doesn't

Today's edition (8-Jun) of my newsletter is ready.

(Consider subscribing, I write it daily. Link in comments &amp; bio and also you will get a 1300+ page Python book as soon as you subscribe).

Prompting with AI scales, Verifying doesn't
Julia Neagu (@juliaaneagu) 's Twitter Profile Photo

โ€œYou want your model hitting milestones, not minefields.โ€ Most AI eval talk is hand-wavy. This isnโ€™t. Freddie Vargus (Quotient AI CTO) gets into the weeds: how to actually test tool use, avoid minefields, and build agents that donโ€™t break. Check out the recording๐Ÿ‘‡

Julia Neagu (@juliaaneagu) 's Twitter Profile Photo

Just shared the slides from our AI Engineer World Fair talk: Evaluating AI Search โ€“ A Practical Framework for Augmented Systems. As more AI agents rely on real-time data (like the web!), traditional eval approaches are falling behind and don't capture what's actually

Just shared the slides from our <a href="/aiDotEngineer/">AI Engineer</a> World Fair talk: Evaluating AI Search โ€“ A Practical Framework for Augmented Systems.

As more AI agents rely on real-time data (like the web!), traditional eval approaches are falling behind and don't capture what's actually
Julia Neagu (@juliaaneagu) 's Twitter Profile Photo

AI Engineer Looking for more resources (think: research, OSS libraries, cookbooks and more!) for AI reliability? We have that! Check out Quotient AI Alpha, our collection of tool, resources and research. more coming weekly ๐Ÿ‘€

Julia Neagu (@juliaaneagu) 's Twitter Profile Photo

The worst part of building an agent? You donโ€™t know itโ€™s broken until your users tell you. We just dropped a cookbook for a web research agent with real time monitoring โ€” so you can catch critical issues as they happen. ft. tavily LangChain OpenAI Quotient AI

jason liu - vacation mode (@jxnlco) 's Twitter Profile Photo

how do i catch hallucinations? come learn to implement monitoring systems that catch AI errors as they happen in live production environments with Julia Neagu and Quotient AI if you register, you'll be sent the recording and study notes after they're done!

Julia Neagu (@juliaaneagu) 's Twitter Profile Photo

If you're shipping LLMs to production and still finding out about critical from your users, this course is for you. Real-time evals, automated detection, and the tools we use at Quotient AI to keep AI grounded. On July 30th jason liu and myself are laying it all out.

If you're shipping LLMs to production and still finding out about critical from your users, this course is for you.

Real-time evals, automated detection, and the tools we use at <a href="/QuotientAI/">Quotient AI</a> to keep AI grounded.

On July 30th <a href="/jxnlco/">jason liu</a> and myself are laying it all out.