Shrusti Ghela (@shrusti_ghela) Twitter Tweets • TwiCopy

𝚐𝔪𝟾𝚡𝚡𝟾

a year ago

HALoGEN: Fantastic LLM Hallucinations and Where to Find Them HALoGEN is a benchmark to evaluate hallucinations in LLMs. It includes 10,923 prompts across nine domains and automated verifiers to validate model outputs against reliable sources. Tests on ~150,000 outputs from 14

thumb_up_off_alt15

chat_bubble_outline1

repeat4

shareShare

fly51fly

@fly51fly

a year ago

[CL] HALoGEN: Fantastic LLM Hallucinations and Where to Find Them A Ravichander, S Ghela, D Wadden, Y Choi [Google & University of Washington] (2025) arxiv.org/abs/2501.08292

thumb_up_off_alt8

chat_bubble_outline0

repeat5

shareShare

Wayne Radinsky

@waynerad

a year ago

"HALoGEN: Fantastic LLM hallucinations and where to find them". "HALoGEN" stands for "evaluating Hallucinations of Generative Models". It consists of: "a (1) 10,923 prompts for generative models spanning nine domains including programming, scientific attribution, and

thumb_up_off_alt2

chat_bubble_outline1

repeat3

shareShare

Rohan Paul

@rohanpaul_ai

a year ago

HALOGEN is a comprehensive benchmark with automated verifiers that decomposes and analyzes LLM outputs into atomic facts to detect and classify hallucinations across diverse tasks. Methods in this Paper 🔧: → HALOGEN tests LLMs on 9 different domains like coding,

thumb_up_off_alt55

chat_bubble_outline5

repeat14

shareShare

Tuan Truong

@tuantruong

a year ago

🔥 Top 10 LLM Papers This Week: 1. SteLLA: Structured Grading w/ RAG 2. LLMs as Judges of Textual Data 3. Agentic RAG Survey 4. Authenticated AI Agents 5. Enhancing Human-Like LLM Responses 6. WebWalker: LLM Web Traversal 7. HALoGEN: Finding Hallucinations 8. Multiagent

thumb_up_off_alt9

chat_bubble_outline2

repeat5

shareShare

Abhilasha Ravichander

@lasha_nlp

a year ago

We are launching HALoGEN💡, a way to systematically study *when* and *why* LLMs still hallucinate. New work w/ Shrusti Ghela* David Wadden Yejin Choi 💫 🧵 [1/n]