DeepEval (@deepeval) Twitter Tweets • TwiCopy

DeepEval

3 months ago

🚧 One of the biggest barriers to LLM evaluation today? The steep learning curve. You’re expected to: - Understand how each metric works - Interpret what the results actually mean - Compare across multiple (often conflicting) metrics And if you get it wrong? You might end up

thumb_up_off_alt6

chat_bubble_outline0

repeat3

shareShare

Jeffrey 🐬 confident-ai.com

@jeffr_yyy

3 months ago

Thrilled to announce that DeepEval now supports LiteLLM as a native model provider. Developers can now use hundreds of models without having to change the LiteLLM (YC W23) models they're already using. More to come in the following days.

Thrilled to announce that <a href="/deepeval/">DeepEval</a> now supports LiteLLM as a native model provider. Developers can now use hundreds of models without having to change the <a href="/LiteLLM/">LiteLLM (YC W23)</a> models they're already using. More to come in the following days.

thumb_up_off_alt5

chat_bubble_outline1

repeat3

shareShare

Jeffrey 🐬 confident-ai.com

@jeffr_yyy

3 months ago

Great finding DeepEval in today's AI engineer newsletter: aiengineering.beehiiv.com/p/build-and-de…

thumb_up_off_alt1

chat_bubble_outline0

repeat1

shareShare

Jeffrey 🐬 confident-ai.com

@jeffr_yyy

3 months ago

In the week of June DeepEval decided to embrace OpenAI's API messages format for evaluating multi-turn conversations - and as a result almost tripled the total number of conversational evals ran from all our users. As a result we're releasing a native OpenAI integration later

In the week of June <a href="/deepeval/">DeepEval</a> decided to embrace OpenAI's API messages format for evaluating multi-turn conversations - and as a result almost tripled the total number of conversational evals ran from all our users.

As a result we're releasing a native OpenAI integration later

thumb_up_off_alt3

chat_bubble_outline0

repeat3

shareShare

DeepEval

@deepeval

3 months ago

Mult-turn evals are now extremely popular for users on DeepEval.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Jeffrey 🐬 confident-ai.com

@jeffr_yyy

3 months ago

The most neglected pages on Confident AI are the settings, organization, and project pages. Kritin Vongthongsri and I sat down over the weekend and refactored + redesigned 10k lines of code to make things more professional. Does this look better now?

thumb_up_off_alt5

chat_bubble_outline1

repeat3

shareShare

Huzaifa Rashid

@huzaifa1_0

3 months ago

Just learned Python decorators & found out you can trace your entire LLM app with just observe Most LLM evals: “Something’s broken. Somewhere.” With DeepEval: 🔹 Trace tools, retrievers, LLMs 🔹 Add custom metrics 🔹 Visualize what’s working No refactor needed.

thumb_up_off_alt3

chat_bubble_outline1

repeat1

shareShare

Mayank

@themayanksol

3 months ago

"perfection is the enemy of good" - voltaire shipped DeepEval 1. enable tracing with hierarchy of spans LangChain agents 2. initial iteration of CrewAI integration with deepeval to trace your LLM spans 3. initial iteration of LlamaIndex 🦙 integration with deepeval..

thumb_up_off_alt3

chat_bubble_outline2

repeat2

shareShare

Kritin Vongthongsri

@kritinv07

3 months ago

You guys loved G-Eval, so we shipped Multimodal G-Eval. You can now evaluate any multimodal task (text2image, computer/browser use, etc) in plain english. We've also added 1. gpt-4.1 and o4-mini suppport for all multimodal metrics on DeepEval 2. Image support on Confident AI

thumb_up_off_alt5

chat_bubble_outline1

repeat3

shareShare

John Au-Yeung

@aumayeung

3 months ago

Avi Chawla This is 🔥 Love how DeepEval makes deep LLM tracing dead simple without touching existing code.

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

Jeffrey 🐬 confident-ai.com

@jeffr_yyy

3 months ago

DeepEval on trending again, seems like the recent component-level evals is definitely something people need for evaluating AI agents.

<a href="/deepeval/">DeepEval</a> on trending again, seems like the recent component-level evals is definitely something people need for evaluating AI agents.

thumb_up_off_alt4

chat_bubble_outline0

repeat3

shareShare

Kritin Vongthongsri

@kritinv07

3 months ago

Today, we've finally changed DeepEval's default eval model from gpt-4o to gpt-4.1... and it took around 5 minutes. Better late than never, I guess 😅.

Today, we've finally changed <a href="/deepeval/">DeepEval</a>'s default eval model from gpt-4o to gpt-4.1... and it took around 5 minutes.

Better late than never, I guess 😅.

thumb_up_off_alt6

chat_bubble_outline0

repeat2

shareShare

Kritin Vongthongsri

@kritinv07

3 months ago

Too lazy to sift through 100+ test cases? Confident AI just dropped 🪄AI Insights Board🪄 This means running evals with DeepEval instantly tells you: 1. What are your LLM app's strengths and weaknesses 2. Areas your LLM consistently struggles with 3. The exact model and

thumb_up_off_alt4

chat_bubble_outline2

repeat3

shareShare

LlamaIndex 🦙

@llama_index

3 months ago

This guest post from DeepEval shows you how to build better RAG applications by combining LlamaIndex with comprehensive evaluation: 🎯 Use Answer Relevancy, Faithfulness, and Contextual Precision metrics to measure both your retriever and generator components 🔧 Set up

This guest post from <a href="/deepeval/">DeepEval</a> shows you how to build better RAG applications by combining LlamaIndex with comprehensive evaluation:

🎯 Use Answer Relevancy, Faithfulness, and Contextual Precision metrics to measure both your retriever and generator components
🔧 Set up

thumb_up_off_alt26

chat_bubble_outline0

repeat5

shareShare

Jeffrey 🐬 confident-ai.com

@jeffr_yyy

3 months ago

One of the most beautiful things about talking to users is you realize where they are struggling. We're making two major releases at DeepEval this week to address the most common issues people face when running evals.

thumb_up_off_alt3

chat_bubble_outline0

repeat2

shareShare

Kritin Vongthongsri

@kritinv07

3 months ago

At Confident AI, we say no to vibe coding, because we’re all about “vibes” coding.

thumb_up_off_alt6

chat_bubble_outline2

repeat3

shareShare

DeepEval

@deepeval

3 months ago

Don’t get frustrated by writing print statements and endlessly scrolling terminal logs to debug your LangChain (and LangGraph) app. Trace your agent’s execution steps in production on Confident AI using our callback handler, with just two lines of code. Documentation:

Don’t get frustrated by writing print statements and endlessly scrolling terminal logs to debug your
<a href="/LangChainAI/">LangChain</a> (and LangGraph) app.

Trace your agent’s execution steps in production on <a href="/confident_ai/">Confident AI</a> using our callback handler, with just two lines of code.

Documentation:

thumb_up_off_alt4

chat_bubble_outline0

repeat4

shareShare

Jeffrey 🐬 confident-ai.com

@jeffr_yyy

3 months ago

🤖 Two LLM outputs walk into an arena… Only one leaves with the crown 👑 ⚔️ Pairwise battles ⚖️ Elo-style scoring 🙈 Blind trials 🧠 LLMs judging LLMs No complex metrics. Just ask: which one is better? confident-ai.com/blog/llm-arena…

thumb_up_off_alt5

chat_bubble_outline0

repeat4

shareShare

Confident AI

@confident_ai

3 months ago

🚀 Shipped Integration You can now trace your CrewAI apps on the Confident AI platform. By just adding 2 lines of code in your app, you can get the entire execution steps of your agent in the form of a single trace. Leverage your LLM application performance using

🚀 Shipped Integration

You can now trace your <a href="/crewAIInc/">CrewAI</a> apps on the <a href="/confident_ai/">Confident AI</a> platform. By just adding 2 lines of code in your app, you can get the entire execution steps of your agent in the form of a single trace.

Leverage your LLM application performance using

thumb_up_off_alt6

chat_bubble_outline0

repeat3

shareShare

Y Combinator

@ycombinator

3 months ago

.@Confident_AI's DeepTeam is an open-source red teaming framework for AI agents. Test for memory leaks, goal hijacking, and decision flaws across 40+ attack types & vulnerabilities. Congrats on the launch, Jeffrey 🐬 confident-ai.com and Kritin Vongthongsri! github.com/confident-ai/d…

thumb_up_off_alt47

chat_bubble_outline4

repeat11

shareShare