Benjamin Van Durme (@ben_vandurme) Twitter Tweets • TwiCopy

Mustafa Suleyman

8 months ago

You can't just be right, you have to know you're right. Good advice for LLMs, according to new Johns Hopkins research. Sometimes no answer is better than a wrong one – life or death choices in medicine, for example, or big financial decisions. 🧵

thumb_up_off_alt251

chat_bubble_outline29

repeat45

shareShare

Alexander Martin

@alexdmartin314

7 months ago

Wish you could get a Wikipedia style article for unfolding events? Introducing WikiVideo: a new multimodal task and benchmark for Wikipedia-style article generation from multiple videos!

thumb_up_off_alt23

chat_bubble_outline2

repeat13

shareShare

William Fleshman

@willcfleshman

7 months ago

🚨 Our latest paper is now on ArXiv! 👻 (w/ Benjamin Van Durme) SpectR: Dynamically Composing LM Experts with Spectral Routing (1/4) 🧵

🚨 Our latest paper is now on ArXiv! 👻
(w/ <a href="/ben_vandurme/">Benjamin Van Durme</a>)

SpectR: Dynamically Composing LM Experts with Spectral Routing (1/4) 🧵

thumb_up_off_alt23

chat_bubble_outline1

repeat12

shareShare

Satya Nadella

@satyanadella

6 months ago

2. Copilot Tuning: Copilot can now learn your company’s unique tone and language. It is all about taking that expertise you have as a firm and further amplifying it so everyone has access.

thumb_up_off_alt999

chat_bubble_outline1

repeat57

shareShare

Eugene Yang

@eyangtw

6 months ago

🚨Wouldn’t it be nice if your agentic search system could reason over all your docs? ✨Introducing Rank-K, a listwise reranker that benefits from test-time compute and long-context! Rank-K sets a new SoTA for reasoning-based reranking, without reasoning chains from other models.

thumb_up_off_alt190

chat_bubble_outline2

repeat28

shareShare

John Langford

@johnclangford

5 months ago

A new opening for multimodal model research: jobs.careers.microsoft.com/global/en/job/… . Please apply if interested.

thumb_up_off_alt60

chat_bubble_outline2

repeat10

shareShare

Benjamin Van Durme

@ben_vandurme

4 months ago

Will continues to drive great work in the modular use of adapters. From security benefits in AdapterSwap arxiv.org/abs/2404.08417; to RE-adapting arxiv.org/abs/2405.15007 arxiv.org/abs/2406.14764; to the COLM '25 SpectR arxiv.org/abs/2504.03454 that enables this new result LAG.

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Benjamin Van Durme

@ben_vandurme

4 months ago

Ettin, a two-headed giant ... language model en.wikipedia.org/wiki/Ettin

thumb_up_off_alt9

chat_bubble_outline0

repeat3

shareShare

Benjamin Van Durme

@ben_vandurme

3 months ago

I am growing an R&D team around Copilot Tuning, a newly announced effort that supports adaptation at a customer-specific level. Join us! jobs.careers.microsoft.com/global/en/job/… We collaborate with a crack team of eng and scientists that support the product, also growing! jobs.careers.microsoft.com/global/en/job/…

thumb_up_off_alt72

chat_bubble_outline0

repeat15

shareShare

Benjamin Van Durme

@ben_vandurme

3 months ago

From now on in my advising meetings, any negative result will be met with my response of "think deeper"

thumb_up_off_alt24

chat_bubble_outline1

repeat2

shareShare

Johns Hopkins Data Science and AI Institute

@hopkinsdsai

3 months ago

#HopkinsDSAI welcomes 22 new faculty members, who join more than 150 DSAI faculty members across Johns Hopkins University in advancing the study of data science, machine learning, and #AI and translation to a range of critical and emerging fields. ai.jhu.edu/news/data-scie…

#HopkinsDSAI welcomes 22 new faculty members, who join more than 150 DSAI faculty members across <a href="/JohnsHopkins/">Johns Hopkins University</a> in advancing the study of data science, machine learning, and #AI and translation to a range of critical and emerging fields.

ai.jhu.edu/news/data-scie…

thumb_up_off_alt191

chat_bubble_outline3

repeat27

shareShare

Jack Jingyu Zhang @ NAACL🌵

@jackjingyuzhang

3 months ago

Introducing 𝐉𝐚𝐢𝐥𝐛𝐫𝐞𝐚𝐤 𝐃𝐢𝐬𝐭𝐢𝐥𝐥𝐚𝐭𝐢𝐨𝐧 🧨 (EMNLP '25 Findings) We propose a generate-then-select pipeline to "distill" effective jailbreak attacks into safety benchmarks, ensuring eval results are reproducible and robust to benchmark saturation & contamination🧵