DeepEval (@deepeval) 's Twitter Profile
DeepEval

@deepeval

The Open-Source LLM Evaluation Framework – created and maintained by @confident_ai

GitHub: github.com/confident-ai/d…

ID: 1888058644434141184

linkhttps://github.com/confident-ai/deepeval calendar_today08-02-2025 02:54:05

35 Tweet

58 Followers

2 Following

DeepEval (@deepeval) 's Twitter Profile Photo

🚧 One of the biggest barriers to LLM evaluation today? The steep learning curve. You’re expected to: - Understand how each metric works - Interpret what the results actually mean - Compare across multiple (often conflicting) metrics And if you get it wrong? You might end up

🚧 One of the biggest barriers to LLM evaluation today?

The steep learning curve.

You’re expected to:

- Understand how each metric works
- Interpret what the results actually mean
- Compare across multiple (often conflicting) metrics

And if you get it wrong? You might end up
Jeffrey 🐬 confident-ai.com (@jeffr_yyy) 's Twitter Profile Photo

Thrilled to announce that DeepEval now supports LiteLLM as a native model provider. Developers can now use hundreds of models without having to change the LiteLLM (YC W23) models they're already using. More to come in the following days.

Thrilled to announce that <a href="/deepeval/">DeepEval</a> now supports LiteLLM as a native model provider. Developers can now use hundreds of models without having to change the <a href="/LiteLLM/">LiteLLM (YC W23)</a>  models they're already using. More to come in the following days.
Jeffrey 🐬 confident-ai.com (@jeffr_yyy) 's Twitter Profile Photo

In the week of June DeepEval decided to embrace OpenAI's API messages format for evaluating multi-turn conversations - and as a result almost tripled the total number of conversational evals ran from all our users. As a result we're releasing a native OpenAI integration later

In the week of June <a href="/deepeval/">DeepEval</a> decided to embrace OpenAI's API messages format for evaluating multi-turn conversations - and as a result almost tripled the total number of conversational evals ran from all our users.

As a result we're releasing a native OpenAI integration later
Jeffrey 🐬 confident-ai.com (@jeffr_yyy) 's Twitter Profile Photo

The most neglected pages on Confident AI are the settings, organization, and project pages. Kritin Vongthongsri and I sat down over the weekend and refactored + redesigned 10k lines of code to make things more professional. Does this look better now?

Huzaifa Rashid (@huzaifa1_0) 's Twitter Profile Photo

Just learned Python decorators & found out you can trace your entire LLM app with just observe Most LLM evals: “Something’s broken. Somewhere.” With DeepEval: 🔹 Trace tools, retrievers, LLMs 🔹 Add custom metrics 🔹 Visualize what’s working No refactor needed.

Mayank (@themayanksol) 's Twitter Profile Photo

"perfection is the enemy of good" - voltaire shipped DeepEval 1. enable tracing with hierarchy of spans LangChain agents 2. initial iteration of CrewAI integration with deepeval to trace your LLM spans 3. initial iteration of LlamaIndex 🦙 integration with deepeval..

LlamaIndex 🦙 (@llama_index) 's Twitter Profile Photo

This guest post from DeepEval shows you how to build better RAG applications by combining LlamaIndex with comprehensive evaluation: 🎯 Use Answer Relevancy, Faithfulness, and Contextual Precision metrics to measure both your retriever and generator components 🔧 Set up

This guest post from <a href="/deepeval/">DeepEval</a> shows you how to build better RAG applications by combining LlamaIndex with comprehensive evaluation:

🎯 Use Answer Relevancy, Faithfulness, and Contextual Precision metrics to measure both your retriever and generator components
🔧 Set up
Jeffrey 🐬 confident-ai.com (@jeffr_yyy) 's Twitter Profile Photo

One of the most beautiful things about talking to users is you realize where they are struggling. We're making two major releases at DeepEval this week to address the most common issues people face when running evals.

DeepEval (@deepeval) 's Twitter Profile Photo

Don’t get frustrated by writing print statements and endlessly scrolling terminal logs to debug your LangChain (and LangGraph) app. Trace your agent’s execution steps in production on Confident AI using our callback handler, with just two lines of code. Documentation:

Don’t get frustrated by writing print statements and endlessly scrolling terminal logs to debug your
<a href="/LangChainAI/">LangChain</a> (and LangGraph) app.

Trace your agent’s execution steps in production on <a href="/confident_ai/">Confident AI</a> using our callback handler, with just two lines of code.

Documentation:
Jeffrey 🐬 confident-ai.com (@jeffr_yyy) 's Twitter Profile Photo

🤖 Two LLM outputs walk into an arena… Only one leaves with the crown 👑 ⚔️ Pairwise battles ⚖️ Elo-style scoring 🙈 Blind trials 🧠 LLMs judging LLMs No complex metrics. Just ask: which one is better? confident-ai.com/blog/llm-arena…

Confident AI (@confident_ai) 's Twitter Profile Photo

🚀 Shipped Integration You can now trace your CrewAI apps on the Confident AI platform. By just adding 2 lines of code in your app, you can get the entire execution steps of your agent in the form of a single trace. Leverage your LLM application performance using

🚀 Shipped Integration

You can now trace your <a href="/crewAIInc/">CrewAI</a>  apps on the <a href="/confident_ai/">Confident AI</a> platform. By just adding 2 lines of code in your app, you can get the entire execution steps of your agent in the form of a single trace.

Leverage your LLM application performance using
Y Combinator (@ycombinator) 's Twitter Profile Photo

.@Confident_AI's DeepTeam is an open-source red teaming framework for AI agents. Test for memory leaks, goal hijacking, and decision flaws across 40+ attack types & vulnerabilities. Congrats on the launch, Jeffrey 🐬 confident-ai.com and Kritin Vongthongsri! github.com/confident-ai/d…