Jeffrey 🐬 confident-ai.com (@jeffr_yyy) Twitter Tweets • TwiCopy

Tom Blomfield

5 months ago

I'm hosting a Y Combinator event for early-stage founders at Monzo 🏦 HQ in London on July 15th. You can apply for a ticket here: events.ycombinator.com/OCU91g6z4

thumb_up_off_alt134

chat_bubble_outline8

repeat18

shareShare

Jeffrey 🐬 confident-ai.com

@jeffr_yyy

5 months ago

Great finding DeepEval in today's AI engineer newsletter: aiengineering.beehiiv.com/p/build-and-de…

thumb_up_off_alt1

chat_bubble_outline0

repeat1

shareShare

In the week of June DeepEval decided to embrace OpenAI's API messages format for evaluating multi-turn conversations - and as a result almost tripled the total number of conversational evals ran from all our users. As a result we're releasing a native OpenAI integration later

In the week of June <a href="/deepeval/">DeepEval</a> decided to embrace OpenAI's API messages format for evaluating multi-turn conversations - and as a result almost tripled the total number of conversational evals ran from all our users.

As a result we're releasing a native OpenAI integration later

thumb_up_off_alt3

chat_bubble_outline0

repeat3

shareShare

Jeffrey 🐬 confident-ai.com

@jeffr_yyy

5 months ago

The most neglected pages on Confident AI are the settings, organization, and project pages. Kritin Vongthongsri and I sat down over the weekend and refactored + redesigned 10k lines of code to make things more professional. Does this look better now?

thumb_up_off_alt5

chat_bubble_outline1

repeat3

shareShare

Mayank

@themayanksol

5 months ago

"perfection is the enemy of good" - voltaire shipped DeepEval 1. enable tracing with hierarchy of spans LangChain agents 2. initial iteration of CrewAI integration with deepeval to trace your LLM spans 3. initial iteration of LlamaIndex 🦙 integration with deepeval..

thumb_up_off_alt3

chat_bubble_outline2

repeat2

shareShare

Kritin Vongthongsri

@kritinv07

5 months ago

You guys loved G-Eval, so we shipped Multimodal G-Eval. You can now evaluate any multimodal task (text2image, computer/browser use, etc) in plain english. We've also added 1. gpt-4.1 and o4-mini suppport for all multimodal metrics on DeepEval 2. Image support on Confident AI

thumb_up_off_alt5

chat_bubble_outline1

repeat3

shareShare

Kritin Vongthongsri

@kritinv07

5 months ago

Today, we've finally changed DeepEval's default eval model from gpt-4o to gpt-4.1... and it took around 5 minutes. Better late than never, I guess 😅.

Today, we've finally changed <a href="/deepeval/">DeepEval</a>'s default eval model from gpt-4o to gpt-4.1... and it took around 5 minutes.

Better late than never, I guess 😅.

thumb_up_off_alt6

chat_bubble_outline0

repeat2

shareShare

Kritin Vongthongsri

@kritinv07

5 months ago

Too lazy to sift through 100+ test cases? Confident AI just dropped 🪄AI Insights Board🪄 This means running evals with DeepEval instantly tells you: 1. What are your LLM app's strengths and weaknesses 2. Areas your LLM consistently struggles with 3. The exact model and

thumb_up_off_alt4

chat_bubble_outline2

repeat3

shareShare

LlamaIndex 🦙

@llama_index

5 months ago

This guest post from DeepEval shows you how to build better RAG applications by combining LlamaIndex with comprehensive evaluation: 🎯 Use Answer Relevancy, Faithfulness, and Contextual Precision metrics to measure both your retriever and generator components 🔧 Set up

This guest post from <a href="/deepeval/">DeepEval</a> shows you how to build better RAG applications by combining LlamaIndex with comprehensive evaluation:

🎯 Use Answer Relevancy, Faithfulness, and Contextual Precision metrics to measure both your retriever and generator components
🔧 Set up

thumb_up_off_alt26

chat_bubble_outline0

repeat5

shareShare

Jeffrey 🐬 confident-ai.com

@jeffr_yyy

5 months ago

Soham left and right

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Jeffrey 🐬 confident-ai.com

@jeffr_yyy

5 months ago

no idea why is it so hard to just find the list of all available OpenAI models

thumb_up_off_alt0

chat_bubble_outline1

repeat0

shareShare

soham

@soham_btw

5 months ago

what is r/opensource even about if not this?

thumb_up_off_alt5,5K

chat_bubble_outline406

repeat147

shareShare

Jeffrey 🐬 confident-ai.com

@jeffr_yyy

5 months ago

One of the most beautiful things about talking to users is you realize where they are struggling. We're making two major releases at DeepEval this week to address the most common issues people face when running evals.

thumb_up_off_alt3

chat_bubble_outline0

repeat2

shareShare

Kritin Vongthongsri

@kritinv07

5 months ago

At Confident AI, we say no to vibe coding, because we’re all about “vibes” coding.

thumb_up_off_alt6

chat_bubble_outline2

repeat3

shareShare

DeepEval

@deepeval

5 months ago

Don’t get frustrated by writing print statements and endlessly scrolling terminal logs to debug your LangChain (and LangGraph) app. Trace your agent’s execution steps in production on Confident AI using our callback handler, with just two lines of code. Documentation:

Don’t get frustrated by writing print statements and endlessly scrolling terminal logs to debug your
<a href="/LangChainAI/">LangChain</a> (and LangGraph) app.

Trace your agent’s execution steps in production on <a href="/confident_ai/">Confident AI</a> using our callback handler, with just two lines of code.

Documentation:

thumb_up_off_alt4

chat_bubble_outline0

repeat4

shareShare

Jeffrey 🐬 confident-ai.com

@jeffr_yyy

5 months ago

🤖 Two LLM outputs walk into an arena… Only one leaves with the crown 👑 ⚔️ Pairwise battles ⚖️ Elo-style scoring 🙈 Blind trials 🧠 LLMs judging LLMs No complex metrics. Just ask: which one is better? confident-ai.com/blog/llm-arena…

thumb_up_off_alt5

chat_bubble_outline0

repeat4

shareShare

Confident AI

@confident_ai

5 months ago

🚀 Shipped Integration You can now trace your CrewAI apps on the Confident AI platform. By just adding 2 lines of code in your app, you can get the entire execution steps of your agent in the form of a single trace. Leverage your LLM application performance using

🚀 Shipped Integration

You can now trace your <a href="/crewAIInc/">CrewAI</a> apps on the <a href="/confident_ai/">Confident AI</a> platform. By just adding 2 lines of code in your app, you can get the entire execution steps of your agent in the form of a single trace.

Leverage your LLM application performance using

thumb_up_off_alt6

chat_bubble_outline0

repeat3

shareShare

Jeffrey 🐬 confident-ai.com

@jeffr_yyy

5 months ago

One of the greatest blocker to LLM evaluation? Communication between engineers and domain experts when curating datasets. When engineers gets put on an AI project, they aren't necessarily experts in the domain they're building for. In fact, the humans safeguarding AI responses

thumb_up_off_alt3

chat_bubble_outline0

repeat2

shareShare

Y Combinator

@ycombinator

5 months ago

.@Confident_AI's DeepTeam is an open-source red teaming framework for AI agents. Test for memory leaks, goal hijacking, and decision flaws across 40+ attack types & vulnerabilities. Congrats on the launch, Jeffrey 🐬 confident-ai.com and Kritin Vongthongsri! github.com/confident-ai/d…

thumb_up_off_alt47

chat_bubble_outline4

repeat11

shareShare

Jeffrey 🐬 confident-ai.com

@jeffr_yyy

5 months ago

Another day, another launch :) Confident AI

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare