
Tom Lieberum 🔸
@lieberum_t
Trying to reduce AGI x-risk by understanding NNs
Interpretability RE @DeepMind
BSc Physics from @RWTH
10% pledgee @ givingwhatwecan.org
ID: 1263043707684696064
20-05-2020 09:47:52
944 Tweet
1,1K Followers
194 Following

Extremely excited to finally get this into people's hands! Huge achievement by the whole mechinterp team Google DeepMind! Also thanks for the amazing support to the Gemma team and the people at neuronpedia for the absolutely stunning interactive demo!



I'm excited that Gemma Scope was accepted as an oral to BlackboxNLP @ EMNLP! Check out Tom Lieberum 🔸's talk on it at 3pm ET today. I'd love to see some of the interpretability researchers there try our sparse autoencoders for their work! There's also now some videos to learn more:

Mechanistic interpretability is fascinating - but can it be useful? In particular, can it beat strong baselines like steering and prompting on downstream tasks that people care about? The answer is, resoundingly, yes. Our new blog post with Adam Karvonen, Sieve, dives into the

Are you worried about risks from AGI and want to mitigate them? Come work with me and my colleagues! We're hiring on the AGI Safety & Alignment team (ASAT) and the Gemini Safety team! Research Engineers: boards.greenhouse.io/deepmind/jobs/… Research Scientists: boards.greenhouse.io/deepmind/jobs/…





