ragas (@ragas_io) 's Twitter Profile
ragas

@ragas_io

Supercharge Your LLM Application Evaluations 🚀

Github: github.com/explodinggradi…
Discord: discord.gg/5djav8GGNZ

ID: 1764911957541584896

linkhttps://ragas.io/ calendar_today05-03-2024 07:13:30

106 Tweet

921 Followers

0 Following

ikka (@shahules786) 's Twitter Profile Photo

A fun weekend project turned out to be an example on how to evaluate simple LLM agents using simulation. I was surprised to see how brittle even the latest LLMs are to different edge-case scenarios. ⭐️ Application LLM is an agent collecting personal user information (name, SSID,

A fun weekend project turned out to be an example on how to evaluate simple LLM agents using simulation. I was surprised to see how brittle even the latest LLMs are to different edge-case scenarios. ⭐️

Application
LLM is an agent collecting personal user information (name, SSID,
ragas (@ragas_io) 's Twitter Profile Photo

Creating synthetic test data that reflects your production use case is hard. However, there is one technique that can make a lot of difference if used correctly: conditioning model generation in persona. Instead of generic, one-size-fits-all questions, craft test cases using

Creating synthetic test data that reflects your production use case is hard. However, there is one technique that can make a lot of difference if used correctly: conditioning model generation in persona.

Instead of generic, one-size-fits-all questions, craft test cases using
ikka (@shahules786) 's Twitter Profile Photo

Ragas office hours are becoming a big hit among the community. In just the last week we did office hours with 5 of the Fortune 50 companies building LLM apps. What do we do differently from all others? We don’t recommend tools instead we recommend processes and opinions based on

ragas (@ragas_io) 's Twitter Profile Photo

Introducing NVIDIA’s RAG metrics in Ragas, new metrics for end-to-end accuracy, relevance, and groundedness, engineered to deliver robust, fast, and cost-effective performance. 1️⃣ Answer Accuracy: End-to-end measurement ensures the RAG’s response perfectly aligns with the ground

Introducing NVIDIA’s RAG metrics in Ragas, new metrics for end-to-end accuracy, relevance, and groundedness, engineered to deliver robust, fast, and cost-effective performance.

1️⃣ Answer Accuracy: End-to-end measurement ensures the RAG’s response perfectly aligns with the ground
ragas (@ragas_io) 's Twitter Profile Photo

Learn to use Vertex AI and Ragas to evaluate LLM workflows in the three-part tutorial series. ⭐️ The series covers 👉 1️⃣ Quick Start: learn how to use Vertex AI models with Ragas to evaluate your LLM workflows. 2️⃣ Align LLM Metrics: Train and align your LLM evaluators to

Learn to use Vertex AI and Ragas to evaluate LLM workflows in the three-part tutorial series. ⭐️

The series covers 👉

1️⃣ Quick Start: learn how to use Vertex AI models with Ragas to evaluate your LLM workflows.

2️⃣ Align LLM Metrics: Train and align your LLM evaluators to
ragas (@ragas_io) 's Twitter Profile Photo

Ragas 🤝 Google Vertex AI A tutorial showing how to use Vertex AI’s generative models with Ragas. Learn to configure Vertex AI’s evaluator LLM and embeddings, and conduct evaluations using a comprehensive suite of Ragas metrics—including model-based, computation-based, and

Ragas 🤝 Google Vertex AI

A tutorial showing how to use Vertex AI’s generative models with Ragas.

Learn to configure Vertex AI’s evaluator LLM and embeddings, and conduct evaluations using a comprehensive suite of Ragas metrics—including model-based, computation-based, and
ragas (@ragas_io) 's Twitter Profile Photo

Misalignment between LLM-based and human evaluators often leads to unreliable results. Evaluations fall short without aligning LLM as judges with human evaluators. Here’s how you can fix it 👉 1️⃣ Evaluate your data using LLM-based metrics. 2️⃣ Identify and annotate discrepancies

ragas (@ragas_io) 's Twitter Profile Photo

Misalignment between LLM-based and human evaluators often leads to unreliable results. Evaluations fall short without aligning LLM as judge with human evaluators. Here’s how you can fix it 👉 1️⃣ Evaluate your data using LLM-based metrics. 2️⃣ Identify and annotate discrepancies

Misalignment between LLM-based and human evaluators often leads to unreliable results.

Evaluations fall short without aligning LLM as judge with human evaluators. Here’s how you can fix it 👉

1️⃣ Evaluate your data using LLM-based metrics. 2️⃣ Identify and annotate discrepancies
ragas (@ragas_io) 's Twitter Profile Photo

Lack of test data is one of the main bottlenecks in evaluation - solve this by generating high-quality synthetic test data ⭐ Generate a diverse synthetic test set of single-hop queries using Ragas with this comprehensive guide, demonstrating the Ragas test set generation

Lack of test data is one of the main bottlenecks in evaluation - solve this by generating high-quality synthetic test data ⭐

Generate a diverse synthetic test set of single-hop queries using Ragas with this comprehensive guide, demonstrating the Ragas test set generation
ragas (@ragas_io) 's Twitter Profile Photo

Here are the two different ways to create criteria for evals. 1️⃣ general rubric: uses a global rubric/criteria to evaluate across the entire dataset. Easy to use, but can have limited accuracy in certain aspects. 2️⃣ instance-specific rubrics: uses custom handwritten

Here are the two different ways to create criteria for evals.

1️⃣  general rubric:  uses a global rubric/criteria to evaluate across the entire dataset. Easy to use, but can have limited accuracy in certain aspects.

2️⃣ instance-specific rubrics: uses custom handwritten
ragas (@ragas_io) 's Twitter Profile Photo

📊 Benchmarking Google Gemini Models on Academic QA using Ragas Metrics Choose the models that fit your needs best. Benchmark them with the metrics that matter to you. This tutorial explains how we benchmark Gemini 1.5 Flash and Gemini 2.0 Flash. We use AllenAI’s QASPER

📊 Benchmarking  <a href="/Google/">Google</a>  Gemini Models on Academic QA using Ragas Metrics

Choose the models that fit your needs best. Benchmark them with the metrics that matter to you.

This tutorial explains how we benchmark Gemini 1.5 Flash and Gemini 2.0 Flash. We use AllenAI’s QASPER
Qdrant (@qdrant_engine) 's Twitter Profile Photo

🧠 Chunking changes everything in RAG. This benchmark post evaluated Fixed, Semantic, Agentic, and Recursive chunking in Agentic RAG. Built with Agno, Qdrant, ragas, and LlamaIndex 🦙. And measured with relevant metrics: Context Recall, Faithfulness, Factual

🧠 Chunking changes everything in RAG. 

This benchmark post evaluated Fixed, Semantic, Agentic, and Recursive chunking in Agentic RAG.

Built with <a href="/AgnoAgi/">Agno</a>, <a href="/qdrant_engine/">Qdrant</a>, <a href="/ragas_io/">ragas</a>, and <a href="/llama_index/">LlamaIndex 🦙</a>.

And measured with relevant metrics: Context Recall, Faithfulness, Factual
ragas (@ragas_io) 's Twitter Profile Photo

We're excited to see our co-founder, Jithin James, featured in a Microsoft for Startups and B Capital white paper! 🔥 It’s all about "RAG and the Future of Intelligent Enterprise Applications." This white paper provides valuable insights on RAG technology for businesses. It

We're excited to see our co-founder, Jithin James, featured in a Microsoft for Startups and B Capital white paper! 🔥 It’s all about "RAG and the Future of Intelligent Enterprise Applications."

This white paper provides valuable insights on RAG technology for businesses. It
ragas (@ragas_io) 's Twitter Profile Photo

Use Datadog for LLM observability and Ragas for evaluation. Datadog now allows you to trace and log LLM calls and is integrated with Ragas metrics to evaluate and monitor your AI applications. Take a look at this thorough guide in Datadog's LLM observability section. Improve

Use Datadog for LLM observability and Ragas for evaluation. Datadog now allows you to trace and log LLM calls and is integrated with Ragas metrics to evaluate and monitor your AI applications.

Take a look at this thorough guide in Datadog's LLM observability section. Improve
ragas (@ragas_io) 's Twitter Profile Photo

We are hosting our first-ever hackathon ⭐ Join us to learn, build, and hack on evaluating, experimenting with, and improving any LLM Agent using the Ragas App. Link to the event -> lu.ma/github-hacknig… 📅 When: Thursday, April 17th at 4 PM PST 📍 Where: GitHub HQ, San

We are hosting our first-ever hackathon  ⭐

Join us to learn, build, and hack on evaluating, experimenting with, and improving any LLM Agent using the Ragas App.

Link to the event -&gt; lu.ma/github-hacknig…

📅 When: Thursday, April 17th at 4 PM PST
📍 Where: GitHub HQ, San
ragas (@ragas_io) 's Twitter Profile Photo

LlamaStack + Ragas + Llama 4 = 🚀 LlamaStack is an open-source framework maintained by Meta that streamlines the development and deployment of large language model-powered applications. Use Ragas to evaluate your LlamaStack apps. This tutorial walks you through: • Creating a

LlamaStack + Ragas + Llama 4 = 🚀

LlamaStack is an open-source framework maintained by Meta that streamlines the development and deployment of large language model-powered applications. Use Ragas to evaluate your LlamaStack apps.

This tutorial walks you through:

• Creating a
ragas (@ragas_io) 's Twitter Profile Photo

🔍 Evaluating RAG Pipelines with Ragas + Milvus Milvus is a powerful open-source vector database that excels at similarity search and scales beautifully for production use. When combined with Ragas, you can: ✅ Measure the effectiveness of retrieval ✅ Ensure the generation's

🔍 Evaluating RAG Pipelines with Ragas + Milvus

Milvus is a powerful open-source vector database that excels at similarity search and scales beautifully for production use. When combined with Ragas, you can:

✅ Measure the effectiveness of retrieval

✅ Ensure the generation's
ragas (@ragas_io) 's Twitter Profile Photo

📊 Measure What Matters: Griptape + Ragas Evaluate Griptape's RAG Engines with Ragas integration - get quantifiable metrics on retrieval accuracy and response quality with minimal setup. Griptape, a powerful framework for Gen AI applications development. Check out our new

📊 Measure What Matters: Griptape + Ragas

Evaluate Griptape's RAG Engines with Ragas integration - get quantifiable metrics on retrieval accuracy and response quality with minimal setup.

Griptape, a powerful framework for Gen AI applications development. Check out our new
ragas (@ragas_io) 's Twitter Profile Photo

🧠 Paper Club Alert! We're discussing "The Illusion of Thinking" - Apple's controversial paper on why LRMs ace easy puzzles but crash on hard ones. Join us July 3 @ 9:30 AM PT for: - Overview - Chain-of-thought limitations - Real implications for AI in prod Free on Zoom:

🧠 Paper Club Alert! We're discussing "The Illusion of Thinking" - Apple's controversial paper on why LRMs ace easy puzzles but crash on hard ones.

Join us July 3 @ 9:30 AM PT for:
- Overview
- Chain-of-thought limitations
- Real implications for AI in prod

Free on Zoom: