
Mikita Balesni πΊπ¦
@balesni
deception evals. reversal curse. latent reasoning. @apolloaisafety // best way to support πΊπ¦ savelife.in.ua/en/donate-en/
ID: 1551270667
https://www.mikitabalesni.com 27-06-2013 18:36:37
367 Tweet
461 Followers
587 Following







I am grateful to have worked closely with Tomek Korbak, Mikita Balesni πΊπ¦, Rohin Shah and Vlad Mikulik on this paper, and I am very excited that researchers across many prominent AI institutions collaborated with us and came to consensus around this important direction.


