Shrusti Ghela (@shrusti_ghela) 's Twitter Profile
Shrusti Ghela

@shrusti_ghela

ID: 1620900572911276032

calendar_today01-02-2023 21:43:19

8 Tweet

12 Followers

205 Following

𝚐π”ͺ𝟾𝚑𝚑𝟾 (@gm8xx8) 's Twitter Profile Photo

HALoGEN: Fantastic LLM Hallucinations and Where to Find Them HALoGEN is a benchmark to evaluate hallucinations in LLMs. It includes 10,923 prompts across nine domains and automated verifiers to validate model outputs against reliable sources. Tests on ~150,000 outputs from 14

HALoGEN: Fantastic LLM Hallucinations and Where to Find Them

HALoGEN is a benchmark to evaluate hallucinations in LLMs. It includes 10,923 prompts across nine domains and automated verifiers to validate model outputs against reliable sources. Tests on ~150,000 outputs from 14
fly51fly (@fly51fly) 's Twitter Profile Photo

[CL] HALoGEN: Fantastic LLM Hallucinations and Where to Find Them A Ravichander, S Ghela, D Wadden, Y Choi [Google & University of Washington] (2025) arxiv.org/abs/2501.08292

[CL] HALoGEN: Fantastic LLM Hallucinations and Where to Find Them
A Ravichander, S Ghela, D Wadden, Y Choi [Google & University of Washington] (2025)
arxiv.org/abs/2501.08292
Wayne Radinsky (@waynerad) 's Twitter Profile Photo

"HALoGEN: Fantastic LLM hallucinations and where to find them". "HALoGEN" stands for "evaluating Hallucinations of Generative Models". It consists of: "a (1) 10,923 prompts for generative models spanning nine domains including programming, scientific attribution, and

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

HALOGEN is a comprehensive benchmark with automated verifiers that decomposes and analyzes LLM outputs into atomic facts to detect and classify hallucinations across diverse tasks. Methods in this Paper πŸ”§: β†’ HALOGEN tests LLMs on 9 different domains like coding,

HALOGEN is a comprehensive benchmark with automated verifiers that decomposes and analyzes LLM outputs into atomic facts to detect and classify hallucinations across diverse tasks.

Methods in this Paper πŸ”§:

β†’ HALOGEN tests LLMs on 9 different domains like coding,
Tuan Truong (@tuantruong) 's Twitter Profile Photo

πŸ”₯ Top 10 LLM Papers This Week: 1. SteLLA: Structured Grading w/ RAG 2. LLMs as Judges of Textual Data 3. Agentic RAG Survey 4. Authenticated AI Agents 5. Enhancing Human-Like LLM Responses 6. WebWalker: LLM Web Traversal 7. HALoGEN: Finding Hallucinations 8. Multiagent

Rob Freund (@robertfreundlaw) 's Twitter Profile Photo

I don't know why lawyers keep blindly relying on AI for case citations, knowing that judges are going to check, but here's another one from today:

I don't know why lawyers keep blindly relying on AI for case citations, knowing that judges are going to check, but here's another one from today: