Benjamin Van Durme (@ben_vandurme) 's Twitter Profile
Benjamin Van Durme

@ben_vandurme

Johns Hopkins / Microsoft

ID: 17988541

calendar_today09-12-2008 10:04:30

158 Tweet

1,1K Followers

70 Following

Mustafa Suleyman (@mustafasuleyman) 's Twitter Profile Photo

You can't just be right, you have to know you're right. Good advice for LLMs, according to new Johns Hopkins research. Sometimes no answer is better than a wrong one – life or death choices in medicine, for example, or big financial decisions. 🧵

You can't just be right, you have to know you're right. Good advice for LLMs, according to new Johns Hopkins research. Sometimes no answer is better than a wrong one – life or death choices in medicine, for example, or big financial decisions. 🧵
Alexander Martin (@alexdmartin314) 's Twitter Profile Photo

Wish you could get a Wikipedia style article for unfolding events? Introducing WikiVideo: a new multimodal task and benchmark for Wikipedia-style article generation from multiple videos!

Wish you could get a Wikipedia style article for unfolding events?

Introducing WikiVideo: a new multimodal task and benchmark for Wikipedia-style article generation from multiple videos!
Satya Nadella (@satyanadella) 's Twitter Profile Photo

2. Copilot Tuning: Copilot can now learn your company’s unique tone and language. It is all about taking that expertise you have as a firm and further amplifying it so everyone has access.

2. Copilot Tuning: Copilot can now learn your company’s unique tone and language. It is all about taking that expertise you have as a firm and further amplifying it so everyone has access.
Eugene Yang (@eyangtw) 's Twitter Profile Photo

🚨Wouldn’t it be nice if your agentic search system could reason over all your docs? ✨Introducing Rank-K, a listwise reranker that benefits from test-time compute and long-context! Rank-K sets a new SoTA for reasoning-based reranking, without reasoning chains from other models.

🚨Wouldn’t it be nice if your agentic search system could reason over all your docs?

✨Introducing Rank-K, a listwise reranker that benefits from test-time compute and long-context! Rank-K sets a new SoTA for reasoning-based reranking, without reasoning chains from other models.
Benjamin Van Durme (@ben_vandurme) 's Twitter Profile Photo

Will continues to drive great work in the modular use of adapters. From security benefits in AdapterSwap arxiv.org/abs/2404.08417; to RE-adapting arxiv.org/abs/2405.15007 arxiv.org/abs/2406.14764; to the COLM '25 SpectR arxiv.org/abs/2504.03454 that enables this new result LAG.

Benjamin Van Durme (@ben_vandurme) 's Twitter Profile Photo

I am growing an R&D team around Copilot Tuning, a newly announced effort that supports adaptation at a customer-specific level. Join us! jobs.careers.microsoft.com/global/en/job/… We collaborate with a crack team of eng and scientists that support the product, also growing! jobs.careers.microsoft.com/global/en/job/…

Johns Hopkins Data Science and AI Institute (@hopkinsdsai) 's Twitter Profile Photo

#HopkinsDSAI welcomes 22 new faculty members, who join more than 150 DSAI faculty members across Johns Hopkins University in advancing the study of data science, machine learning, and #AI and translation to a range of critical and emerging fields. ai.jhu.edu/news/data-scie…

#HopkinsDSAI welcomes 22 new faculty members, who join more than 150 DSAI faculty members across <a href="/JohnsHopkins/">Johns Hopkins University</a> in advancing the study of data science, machine learning, and #AI and translation to a range of critical and emerging fields.

ai.jhu.edu/news/data-scie…
Jack Jingyu Zhang @ NAACL🌵 (@jackjingyuzhang) 's Twitter Profile Photo

Introducing 𝐉𝐚𝐢𝐥𝐛𝐫𝐞𝐚𝐤 𝐃𝐢𝐬𝐭𝐢𝐥𝐥𝐚𝐭𝐢𝐨𝐧 🧨 (EMNLP '25 Findings) We propose a generate-then-select pipeline to "distill" effective jailbreak attacks into safety benchmarks, ensuring eval results are reproducible and robust to benchmark saturation & contamination🧵

Introducing 𝐉𝐚𝐢𝐥𝐛𝐫𝐞𝐚𝐤 𝐃𝐢𝐬𝐭𝐢𝐥𝐥𝐚𝐭𝐢𝐨𝐧 🧨 (EMNLP '25 Findings)

We propose a generate-then-select pipeline to "distill" effective jailbreak attacks into safety benchmarks, ensuring eval results are reproducible and robust to benchmark saturation &amp; contamination🧵