BIU NLP (@biunlp) 's Twitter Profile
BIU NLP

@biunlp

The Bar-Ilan University, Natural Language Processing group.

ID: 347798643

linkhttps://biu-nlp.github.io/ calendar_today03-08-2011 11:21:06

192 Tweet

740 Followers

103 Following

Eran Hirsch (@hirscheran) 's Twitter Profile Photo

🚨 Introducing LAQuer, accepted to #ACL2025 (main conf)! LAQuer provides more granular attribution for LLM generations: users can just highlight any output fact (top), and get attribution for that input snippet (bottom). This reduces the amount of text the user has to read by 2

🚨 Introducing LAQuer, accepted to #ACL2025 (main conf)!

LAQuer provides more granular attribution for LLM generations: users can just highlight any output fact (top), and get attribution for that input snippet (bottom). This reduces the amount of text the user has to read by 2
Elias Stengel-Eskin (on the faculty job market) (@eliaseskin) 's Twitter Profile Photo

Attribution is key to being able to audit an LLM's generations, but is only useful when it is high precision. LAQuer requires models to attribute arbitrary spans in the output to fine-grained source spans, making checking for factuality in the output faster and easier on

David Wan (@meetdavidwan) 's Twitter Profile Photo

Verifying LLM-generated facts can be a slog through lengthy citations. LAQuer introduces a novel framework, allowing users to simply highlight an output fact and get precise, snippet-level attribution! This massively cuts down the reading needed for verification – by up to 2

Elias Stengel-Eskin (on the faculty job market) (@eliaseskin) 's Twitter Profile Photo

🚨 CLATTER treats entailment as a reasoning process, guiding models to follow concrete steps (decomposition, attribution/entailment, and aggregation). CLATTER improves hallucination detection via NLI, with gains on ClaimVerify, LFQA, and TofuEval especially on long-reasoning

Eran Hirsch (@hirscheran) 's Twitter Profile Photo

🚨 New preprint! We propose a reasoning process for hallucination detection: 1️⃣ Decompose the output 2️⃣ Generate fine-grained attribution (if possible), and accordingly make local entailment decisions 3️⃣ Aggregate all to a final decision We also introduce metrics to evaluate

David Wan (@meetdavidwan) 's Twitter Profile Photo

Excited to share GenerationPrograms! 🚀 How do we get LLMs to cite their sources? GenerationPrograms is attributable by design, producing a program that executes text w/ a trace of how the text was generated! Gains of up to +39 Attribution F1 and eliminates uncited sentences,

Excited to share GenerationPrograms! 🚀

How do we get LLMs to cite their sources? GenerationPrograms is attributable by design, producing a program that executes text w/ a trace of how the text was generated! Gains of up to +39 Attribution F1 and eliminates uncited sentences,
David Wan (@meetdavidwan) 's Twitter Profile Photo

🎉 Our paper, GenerationPrograms, which proposes a modular framework for attributable text generation, has been accepted to Conference on Language Modeling! GenerationPrograms produces a program that executes to text, providing an auditable trace of how the text was generated and major gains on

Tzuf - צוף (@tzuf6) 's Twitter Profile Photo

BIU NLP Itai Mondshine Reut Tsarfaty Our paper "Beyond N-Grams: Rethinking Evaluation Metrics and Strategies for Multilingual Abstractive Summarization": arxiv.org/pdf/2507.08342

Amir David Nissan cohen (@amirdnc) 's Twitter Profile Photo

Led by Aviya Maimon, our new paper redefines how we evaluate LLMs. Instead of one flat leaderboard score, we uncover the latent skills—reasoning, comprehension, ethics, precision & more—that really shape LLM ability. Think: psychometrics meets AI. link: arxiv.org/pdf/2507.20208

Shauli Ravfogel (@ravfogel) 's Twitter Profile Photo

1/8 Happy to share our new paper—“IQ Test for LLMs”—co-authored with Aviya Maimon, Amir David Nissan cohen, Gal Vishne @neurogal.bsky.social and Reut Tsarfaty. We propose to rethink how language models are evaluated by focusing on the latent capabilities that explain benchmark results. Arxiv: arxiv.org/pdf/2507.20208

1/8 Happy to share our new paper—“IQ Test for LLMs”—co-authored with <a href="/AviyaMaimon/">Aviya Maimon</a>, <a href="/AmirDNC/">Amir David Nissan cohen</a>, <a href="/neuro_gal/">Gal Vishne @neurogal.bsky.social</a> and <a href="/rtsarfaty/">Reut Tsarfaty</a>. We propose to rethink how language models are evaluated by focusing on the latent capabilities that explain benchmark  results.
Arxiv: arxiv.org/pdf/2507.20208
Avshalom Manevich (@avshalomm) 's Twitter Profile Photo

We introduce CoCI, which improves fine-grained visual discrimination in LVLMs using contrast images. Shows up to 98.9% improvement on NaturalBench across three different supervision regimes. Reut Tsarfaty 📄 aclanthology.org/anthology-file…