Jan Leike (@janleike) 's Twitter Profile
Jan Leike

@janleike

ML Researcher @AnthropicAI. Previously OpenAI & DeepMind.
Optimizing for a post-AGI future where humanity flourishes. Opinions aren't my employer's.

ID: 710610891058716673

linkhttps://jan.leike.name/ calendar_today17-03-2016 23:36:53

697 Tweet

106,106K Followers

332 Following

Yanda Chen (@yanda_chen_) 's Twitter Profile Photo

My first paper Anthropic is out! We show that Chains-of-Thought often don’t reflect models’ true reasoning—posing challenges for safety monitoring. It’s been an incredible 6 months pushing the frontier toward safe AGI with brilliant colleagues. Huge thanks to the team! 🙏

Sam Bowman (@sleepinyourhat) 's Twitter Profile Photo

🧵✨🙏 With the new Claude Opus 4, we conducted what I think is by far the most thorough pre-launch alignment assessment to date, aimed at understanding its values, goals, and propensities. Preparing it was a wild ride. Here’s some of what we learned. 🙏✨🧵