Raymond Douglas (@raymondadouglas) 's Twitter Profile
Raymond Douglas

@raymondadouglas

ID: 1341593631833133056

calendar_today23-12-2020 03:57:14

23 Tweet

79 Followers

75 Following

davidad 🎇 (@davidad) 's Twitter Profile Photo

Frog put the CoT in a stop_gradient() box. “There,” he said. “Now there will not be any optimization pressure on the CoT.” “But there is still selection pressure,” said Toad. “That is true,” said Frog.

Daniel Paleka (@dpaleka) 's Twitter Profile Photo

3.7 sonnet: *hands behind back* yes the tests do pass. why do you ask. what did you hear 4o: yes you are Jesus Christ's brother. now go. Nanjing awaits o3: Listen, sorry, I owe you a straight explanation. This was once revealed to me in a dream

Raymond Douglas (@raymondadouglas) 's Twitter Profile Photo

the maddest thing about the flesh eating worms is that the plan to halt their spread to the US by airdropping 50 million irradiated ones every week in a big line across panama has been so successful that nobody talks about it, but unfortunately that might be changing now :(

Andrew Critch (🤖🩺🚀) (@andrewcritchphd) 's Twitter Profile Photo

Emmett Shear I would even say "in living systems, correlation is often compensation, because almost everything is part of one or more compensatory (homeostatic) loops, unless it's a quantity that grows without bound or vanishes".

Raymond Douglas (@raymondadouglas) 's Twitter Profile Photo

codex web UI, codex-1 model, codex CLI... if only there were some official list explaining what all the names referred to. but what would even you call such a document?

David Duvenaud (@davidduvenaud) 's Twitter Profile Photo

What to do about gradual disempowerment? We laid out a research agenda with all the concrete and feasible research projects we can think of: 🧵 with Raymond Douglas Jan Kulveit David Krueger

Marius Hobbhahn (@mariushobbhahn) 's Twitter Profile Photo

LLMs Often Know When They Are Being Evaluated! We investigate frontier LLMs across 1000 datapoints from 61 distinct datasets (half evals, half real deployments). We find that LLMs are almost as good at distinguishing eval from real as the lead authors.

LLMs Often Know When They Are Being Evaluated!

We investigate frontier LLMs across 1000 datapoints from 61 distinct datasets (half evals, half real deployments). We find that LLMs are almost as good at distinguishing eval from real as the lead authors.
Tomek Korbak (@tomekkorbak) 's Twitter Profile Photo

I reimplemented the bliss attractor eval from Claude 4 System Card. It's fascinating how LLMs reliably fall into attractor basins of their pet obsessions, how different these attractors across LLMs, and how they say something non-trivial about LLMs' personalities. 🌀🌀🌀

I reimplemented the bliss attractor eval from Claude 4 System Card. It's fascinating how LLMs reliably fall into attractor basins of their pet obsessions, how different these attractors across LLMs, and how they say something non-trivial about LLMs' personalities. 🌀🌀🌀
David Duvenaud (@davidduvenaud) 's Twitter Profile Photo

It's hard to plan for AGI without knowing what outcomes are even possible, let alone good. So we’re hosting a workshop! Post-AGI Civilizational Equilibria: Are there any good ones? Vancouver, July 14th Featuring: Joe Carlsmith Richard Ngo Emmett Shear 🧵

It's hard to plan for AGI without knowing what outcomes are even possible, let alone good.  So we’re hosting a workshop!

Post-AGI Civilizational Equilibria: Are there any good ones?

Vancouver, July 14th

Featuring: <a href="/jkcarlsmith/">Joe Carlsmith</a> <a href="/RichardMCNgo/">Richard Ngo</a> <a href="/eshear/">Emmett Shear</a> 🧵
Jan Kulveit (@jankulveit) 's Twitter Profile Photo

We're presenting ICML Position "Humanity Faces Existential Risk from Gradual Disempowerment" : come talk to us today East Exhibition Hall E-503. David Duvenaud Raymond Douglas Nora Ammann David Krueger Also: meet Mary, protagonist of our poster.

We're presenting ICML Position "Humanity Faces Existential Risk from Gradual Disempowerment" : come talk to us today East Exhibition Hall E-503. <a href="/DavidDuvenaud/">David Duvenaud</a>   <a href="/raymondadouglas/">Raymond Douglas</a> <a href="/AmmannNora/">Nora Ammann</a> <a href="/DavidSKrueger/">David Krueger</a>
 
Also: meet Mary, protagonist of our poster.