jasmine (@j_asminewang) 's Twitter Profile
jasmine

@j_asminewang

control empirics lead @AISecurityInst. cofounded @verses_xyz @kernel_magazine @readtrellis @copysmith_ai

ID: 1295193837258727424

linkhttps://jasminew.me calendar_today17-08-2020 03:01:00

2,2K Tweet

6,6K Followers

1,1K Following

Inworld AI (@inworld_ai) 's Twitter Profile Photo

We all began building to solve human problems and make life better (for everyone). Somewhere along the way, the systems we created to transform our ideas into meaningful applications began to consume the value we intended for users. A thread on returning users to the center of

We all began building to solve human problems and make life better (for everyone). Somewhere along the way, the systems we created to transform our ideas into meaningful applications began to consume the value we intended for users.

A thread on returning users to the center of
Chris Beiser (@ctbeiser) 's Twitter Profile Photo

The pendulums are swinging back. The time has come for Woke 2. We are gonna lib out, make people mad, have fun, and create a beautiful future together. I wrote up some ideas on how:

The pendulums are swinging back.

The time has come for Woke 2.

We are gonna lib out, make people mad, have fun, and create a beautiful future together.

I wrote up some ideas on how:
Jack Clark (@jackclarksf) 's Twitter Profile Photo

As I said in my testimony yesterday, we have a short window of time to get a sensible federal policy framework in place before an accident or a misuse leads to a reactive and likely bad regulatory response.

Mikita Balesni 🇺🇦 (@balesni) 's Twitter Profile Photo

A simple AGI safety technique: AI’s thoughts are in plain English, just read them We know it works, with OK (not perfect) transparency! The risk is fragility: RL training, new architectures, etc threaten transparency Experts from many orgs agree we should try to preserve it:

A simple AGI safety technique: AI’s thoughts are in plain English, just read them

We know it works, with OK (not perfect) transparency!

The risk is fragility: RL training, new architectures, etc threaten transparency

Experts from many orgs agree we should try to preserve it:
jasmine (@j_asminewang) 's Twitter Profile Photo

Cool to see folks from many parts of the AI safety ecosystem unite around this. We should study what makes models monitorable and track monitorability in system cards. Bravo to everyone involved, and thank you especially to Tomek Korbak and Mikita Balesni 🇺🇦 for leading this work!

Bowen Baker (@bobabowen) 's Twitter Profile Photo

Modern reasoning models think in plain English. Monitoring their thoughts could be a powerful, yet fragile, tool for overseeing future AI systems. I and researchers across many organizations think we should work to evaluate, preserve, and even improve CoT monitorability.

Modern reasoning models think in plain English.

Monitoring their thoughts could be a powerful, yet fragile, tool for overseeing future AI systems.

I and researchers across many organizations think we should work to evaluate, preserve, and even improve CoT monitorability.
Tomek Korbak (@tomekkorbak) 's Twitter Profile Photo

The holy grail of AI safety has always been interpretability. But what if reasoning models just handed it to us in a stroke of serendipity? In our new paper, we argue that the AI community should turn this serendipity into a systematic AI safety agenda!🛡️

Wojciech Zaremba (@woj_zaremba) 's Twitter Profile Photo

When models start reasoning step-by-step, we suddenly get a huge safety gift: a window into their thought process. We could easily lose this if we're not careful. We're publishing a paper urging frontier labs: please don't train away this monitorability. Authored and endorsed

When models start reasoning step-by-step, we suddenly get a huge safety gift: a window into their thought process.

We could easily lose this if we're not careful.

We're publishing a paper urging frontier labs: please don't train away this monitorability.

Authored and endorsed
Daniel Kokotajlo (@dkokotajlo) 's Twitter Profile Photo

I'm very happy to see this happen. I think that we're in a vastly better position to solve the alignment problem if we can see what our AIs are thinking, and I think that we sorta mostly can right now, but that by default in the future companies will move away from this paradigm

Toby Ord (@tobyordoxford) 's Twitter Profile Photo

Mikita Balesni 🇺🇦 If someone works out how to trade away this transparency in exchange for more efficiency and ushers in a new era of opaque thoughts, they may have done more than any other individual to lower the chance humanity survives this century.