Ben Plaut (@benplaut) Twitter Tweets • TwiCopy

Ben Plaut

@benplaut

+ Follow

Postdoc in AI safety @ CHAI, UC Berkeley | Maybe the real neural network was the friends we made along the way | bplaut.github.io

ID: 1629320899324157952

calendar_today25-02-2023 03:22:55

17 Tweet

25 Followers

33 Following

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Michael Cohen

@michael05156007

a year ago

We sent this letter to Gavin Newsom this morning. He should sign SB 1047! 🧵

We sent this letter to <a href="/GavinNewsom/">Gavin Newsom</a> this morning. He should sign SB 1047! 🧵

thumb_up_off_alt113

chat_bubble_outline11

repeat33

shareShare

Micah Carroll

@micahcarroll

10 months ago

Center for Human-Compatible AI applications for 2025 close in just over a day! ⏰‼️ Apply now! Details below:

<a href="/CHAI_Berkeley/">Center for Human-Compatible AI</a> applications for 2025 close in just over a day! ⏰‼️

Apply now! Details below:

thumb_up_off_alt27

chat_bubble_outline1

repeat13

shareShare

🚨Our new #ICLR2025 paper presents a unified framework for intrinsic motivation and reward shaping: they signal the value of the RL agent’s state🤖=external state🌎+past experience🧠. Rewards based on potentials over the learning agent’s state provably avoid reward hacking!🧵

thumb_up_off_alt113

chat_bubble_outline3

repeat32

shareShare

Cassidy Laidlaw

@cassidy_laidlaw

4 months ago

We built an AI assistant that plays Minecraft with you. Start building a house—it figures out what you’re doing and jumps in to help. This assistant *wasn't* trained with RLHF. Instead, it's powered by *assistance games*, a better path forward for building AI assistants. 🧵

thumb_up_off_alt2,2K

chat_bubble_outline90

repeat217

shareShare

Karim Abdel Sadek

@karim_abdelll

23 days ago

*New AI Alignment Paper* 🚨 Goal misgeneralization occurs when AI agents learn the wrong reward function, instead of the human's intended goal. 😇 We show that training with a minimax regret objective provably mitigates it, promoting safer and better-aligned RL policies!

thumb_up_off_alt136

chat_bubble_outline9

repeat27

shareShare

Ben Plaut

Gate.io

Michael Cohen

Micah Carroll

Aly Lidayan @ ICLR

Cassidy Laidlaw

Karim Abdel Sadek