Noah Y. Siegel (@noahysiegel) Twitter Tweets • TwiCopy

Noah Y. Siegel

@noahysiegel

+ Follow

Research Engineer @GoogleDeepMind. EA, vegan, giving 10% of my income to effective animal welfare charities. Let's make AGI go well for all sentient beings!

ID: 4549583660

calendar_today13-12-2015 18:51:32

23 Tweet

170 Followers

115 Following

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Noah Y. Siegel

@noahysiegel

a year ago

Excited that this work has come out now!

thumb_up_off_alt7

chat_bubble_outline0

repeat0

shareShare

Eventually, humans will need to supervise superhuman AI - but how? Can we study it now? We don't have superhuman AI, but we do have LLMs. We study protocols where a weaker LLM uses stronger ones to find better answers than it knows itself. Does this work? It’s complicated: 🧵👇

thumb_up_off_alt245

chat_bubble_outline5

repeat63

shareShare

Anca Dragan

@ancadianadragan

a year ago

So freaking proud of the AGI safety&alignment team -- read here a retrospective of the work over the past 1.5 years across frontier safety, oversight, interpretability, and more. Onwards! alignmentforum.org/posts/79BPxvSs…

thumb_up_off_alt326

chat_bubble_outline7

repeat62

shareShare

Yoshua Bengio

@yoshua_bengio

a year ago

Employees of frontier AI labs are in a unique position to understand the potential impact of the most advanced AI models and their perspectives on this matter must be taken into account. I strongly encourage Governor Gavin Newsom to sign SB 1047 into law. calltolead.org

thumb_up_off_alt186

chat_bubble_outline14

repeat35

shareShare

Zac Kenton

@zackenton1

10 months ago

Excited to share that our scalable oversight paper has been accepted to #NeurIPS2024

thumb_up_off_alt40

chat_bubble_outline2

repeat6

shareShare

Alistair Stewart Ⓥ ⏸️

@alistair___s

8 months ago

to sentient non-humans, we are an unaligned superintelligence

thumb_up_off_alt330

chat_bubble_outline9

repeat27

shareShare

David Lindner

@davlindner

7 months ago

New Google DeepMind safety paper! LLM agents are coming – how do we stop them finding complex plans to hack the reward? Our method, MONA, prevents many such hacks, *even if* humans are unable to detect them! Inspired by myopic optimization but better performance – details in🧵

thumb_up_off_alt576

chat_bubble_outline16

repeat97

shareShare

Zac Kenton

@zackenton1

6 months ago

We're hiring for our Google DeepMind AGI Safety & Alignment and Gemini Safety teams. Locations: London, NYC, Mountain View, SF. Join us to help build safe AGI. Research Engineer boards.greenhouse.io/deepmind/jobs/…… Research Scientist boards.greenhouse.io/deepmind/jobs/…

thumb_up_off_alt280

chat_bubble_outline4

repeat36

shareShare

Rohin Shah

@rohinmshah

6 months ago

We're hiring! Join an elite team that sets an AGI safety approach for all of Google -- both through development and implementation of the Frontier Safety Framework (FSF), and through research that enables a future stronger FSF.

thumb_up_off_alt297

chat_bubble_outline11

repeat37

shareShare

Arthur Conmy

@arthurconmy

6 months ago

We are hiring Applied Interpretability researchers on the GDM Mech Interp Team!🧵 If interpretability is ever going to be useful, we need it to be applied at the frontier. Come work with Neel Nanda, the Google DeepMind AGI Safety team, and me: apply by 28th February as a

thumb_up_off_alt283

chat_bubble_outline2

repeat35

shareShare

Fazl Barez

@fazlbarez

a month ago

Excited to share our paper: "Chain-of-Thought Is Not Explainability"! We unpack a critical misconception in AI: models explaining their Chain-of-Thought (CoT) steps aren't necessarily revealing their true reasoning. Spoiler: transparency of CoT can be an illusion. (1/9) 🧵

thumb_up_off_alt588

chat_bubble_outline19

repeat119

shareShare

Chloe Li

@clippocampus

22 days ago

Can LLMs covertly sandbag on capability evaluations against CoT monitoring? Find out on Saturday at the ICML Technical AI Governance @ ICML 2025 workshop! I’ll be giving a talk at 10:50 about work done on this by me, Mary Phuong and Noah Y. Siegel. Swing by and chat to me in-person/on DMs

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Noah Y. Siegel

@noahysiegel

4 days ago

Excited to be a SPAR mentor this Fall, come work with me on figuring out how to measure explanatory faithfulness for LLMs!

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare