David Lindner (@davlindner) 's Twitter Profile
David Lindner

@davlindner

Making AI safer @GoogleDeepMind

ID: 551165404

linkhttp://davidlindner.me calendar_today11-04-2012 17:22:13

147 Tweet

1,1K Followers

325 Following

David Lindner (@davlindner) 's Twitter Profile Photo

Had a great conversation with Daniel about our MONA paper. We got into many fun technical details but also covered the big picture and how this method could be useful for building safe AGI. Thanks for having me on!

Daniel Filan (@dfrsrchtwts) 's Twitter Profile Photo

New episode with Samuel Albanie 🇬🇧, where we discuss the recent Google DeepMind paper "An Approach to Technical AGI Safety and Security"! Link to watch below.

New episode with <a href="/SamuelAlbanie/">Samuel Albanie 🇬🇧</a>, where we discuss the recent Google DeepMind paper "An Approach to Technical AGI Safety and Security"! Link to watch below.
Scott Emmons (@emmons_scott) 's Twitter Profile Photo

Is CoT monitoring a lost cause due to unfaithfulness? 🤔 We say no. The key is the complexity of the bad behavior. When we replicate prior unfaithfulness work but increase complexity—unfaithfulness vanishes! Our finding: "When Chain of Thought is Necessary, Language Models

Is CoT monitoring a lost cause due to unfaithfulness? 🤔

We say no. The key is the complexity of the bad behavior. When we replicate prior unfaithfulness work but increase complexity—unfaithfulness vanishes!

Our finding: "When Chain of Thought is Necessary, Language Models
Rohin Shah (@rohinmshah) 's Twitter Profile Photo

Two new papers that elaborate on our approach to deceptive alignment! First paper: we evaluate the model's *stealth* and *situational awareness* -- if they don't have these capabilities, they likely can't cause severe harm. x.com/vkrakovna/stat…

METR (@metr_evals) 's Twitter Profile Photo

We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.

We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers.

The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
David Lindner (@davlindner) 's Twitter Profile Photo

I'll be at ICML this week, looking forward to catching up with old friends and meeting new faces. Lmk if you want to chat!

David Lindner (@davlindner) 's Twitter Profile Photo

I'll be presenting MONA at ICML in the afternoon poster session today. Come stop by from 4:30 pm at East Exhibition Hall E-902

I'll be presenting MONA at ICML in the afternoon poster session today. Come stop by from 4:30 pm at East Exhibition Hall E-902