Jacob Pfau (@jacob_pfau) 's Twitter Profile
Jacob Pfau

@jacob_pfau

Alignment at UKAISI and PhD student at NYU

ID: 1145186034042281984

linkhttps://jacobpfau.com/ calendar_today30-06-2019 04:23:21

710 Tweet

1,1K Followers

1,1K Following

Jacob Pfau (@jacob_pfau) 's Twitter Profile Photo

Geoffrey’s thread gives a great overview of how our safety case carves up the alignment via debate agenda into modular, parallelizable subproblems!

Benjamin Hilton (@benjamin_hilton) 's Twitter Profile Photo

Humans are often very wrong. This is a big problem if you want to use human judgment to oversee super-smart AI systems. In our new post, Geoffrey Irving argues that we might be able to deal with this issue – not by fixing the humans, but by redesigning oversight protocols.

Humans are often very wrong.

This is a big problem if you want to use human judgment to oversee super-smart AI systems.

In our new post, <a href="/geoffreyirving/">Geoffrey Irving</a> argues that we might be able to deal with this issue – not by fixing the humans, but by redesigning oversight protocols.
William Merrill (@lambdaviking) 's Twitter Profile Photo

Padding a transformer’s input with blank tokens (...) is a simple form of test-time compute. Can it increase the computational power of LLMs? 👀 New work with Ashish Sabharwal addresses this with *exact characterizations* of the expressive power of transformers with padding 🧵

Padding a transformer’s input with blank tokens (...) is a simple form of test-time compute. Can it increase the computational power of LLMs? 👀

New work with <a href="/Ashish_S_AI/">Ashish Sabharwal</a> addresses this with *exact characterizations* of the expressive power of transformers with padding 🧵
Benjamin Hilton (@benjamin_hilton) 's Twitter Profile Photo

Come work with me!! I'm hiring a research manager for AI Security Institute's Alignment Team. You'll manage exceptional researchers tackling one of humanity’s biggest challenges. Our mission: ensure we have ways to make superhuman AI safe before it poses critical risks. 1/4

Geoffrey Irving (@geoffreyirving) 's Twitter Profile Photo

New alignment theory paper! We present a new scalable oversight protocol (prover-estimator debate) and a proof that honesty is incentivised at equilibrium (with large assumptions, see 🧵), even when the AIs involved have similar available compute.

New alignment theory paper! We present a new scalable oversight protocol (prover-estimator debate) and a proof that honesty is incentivised at equilibrium (with large assumptions, see 🧵), even when the AIs involved have similar available compute.