Jacob Pfau (@jacob_pfau) Twitter Tweets • TwiCopy

Jacob Pfau

@jacob_pfau

+ Follow

Alignment at UKAISI and PhD student at NYU

ID: 1145186034042281984

linkhttps://jacobpfau.com/ calendar_today30-06-2019 04:23:21

710 Tweet

1,1K Followers

1,1K Following

Jacob Pfau

@jacob_pfau

6 months ago

Geoffrey’s thread gives a great overview of how our safety case carves up the alignment via debate agenda into modular, parallelizable subproblems!

thumb_up_off_alt8

chat_bubble_outline0

repeat1

shareShare

Jacob Pfau

@jacob_pfau

6 months ago

I recently realized that 2027 is in less than 2 years. Largest timeline update I've had in a while.

thumb_up_off_alt40

chat_bubble_outline2

repeat1

shareShare

Humans are often very wrong. This is a big problem if you want to use human judgment to oversee super-smart AI systems. In our new post, Geoffrey Irving argues that we might be able to deal with this issue – not by fixing the humans, but by redesigning oversight protocols.

Humans are often very wrong.

This is a big problem if you want to use human judgment to oversee super-smart AI systems.

In our new post, <a href="/geoffreyirving/">Geoffrey Irving</a> argues that we might be able to deal with this issue – not by fixing the humans, but by redesigning oversight protocols.

thumb_up_off_alt17

chat_bubble_outline1

repeat3

shareShare

William Merrill

@lambdaviking

5 months ago

Padding a transformer’s input with blank tokens (...) is a simple form of test-time compute. Can it increase the computational power of LLMs? 👀 New work with Ashish Sabharwal addresses this with *exact characterizations* of the expressive power of transformers with padding 🧵

thumb_up_off_alt275

chat_bubble_outline3

repeat37

shareShare

Benjamin Hilton

@benjamin_hilton

5 months ago

Come work with me!! I'm hiring a research manager for AI Security Institute's Alignment Team. You'll manage exceptional researchers tackling one of humanity’s biggest challenges. Our mission: ensure we have ways to make superhuman AI safe before it poses critical risks. 1/4

thumb_up_off_alt79

chat_bubble_outline5

repeat19

shareShare

Geoffrey Irving

@geoffreyirving

4 months ago

New alignment theory paper! We present a new scalable oversight protocol (prover-estimator debate) and a proof that honesty is incentivised at equilibrium (with large assumptions, see 🧵), even when the AIs involved have similar available compute.

thumb_up_off_alt325

chat_bubble_outline6

repeat51

shareShare