
Victor Veitch 🔸
@victorveitch
AI | University of Chicago / Google DeepMind
ID: 1400175774
http://victorveitch.com 03-05-2013 16:54:37
1,1K Tweet
4,4K Followers
1,1K Following


I really like this new op ed from David Duvenaud on how so many different kinds of pressures could drive towards loss of human control over AI. It's rare to read anything well written on this topic but this piece was elegant and smart enough that I wanted to keep on reading.


Secure LLMs must separate roles. Finetuning improves security benchmark scores, but do models really learn role separation? 🤔 Our paper reveals an 'Illusion of Role Separation'! 🧵 (1/N) #AISafety w Yibo Jiang Hubert Yoo metasec arxiv.org/pdf/2505.00626





The Eleos AI Research team conducted “welfare interviews” with Anthropic’s Claude Opus 4 about its potential moral status 💬—the first external welfare evaluation of a frontier model This thread: -interviews have clear limitations—but they're still worth doing -what we found



