Tom Lieberum 🔸 (@lieberum_t) 's Twitter Profile
Tom Lieberum 🔸

@lieberum_t

Trying to reduce AGI x-risk by understanding NNs

Interpretability RE @DeepMind
BSc Physics from @RWTH
10% pledgee @ givingwhatwecan.org

ID: 1263043707684696064

calendar_today20-05-2020 09:47:52

944 Tweet

1,1K Followers

194 Following

Tom Lieberum 🔸 (@lieberum_t) 's Twitter Profile Photo

Extremely excited to finally get this into people's hands! Huge achievement by the whole mechinterp team Google DeepMind! Also thanks for the amazing support to the Gemma team and the people at neuronpedia for the absolutely stunning interactive demo!

nev (@neverrixx) 's Twitter Profile Photo

We quantized all Gemma Scopes into 4 bits, reducing memory and storage requirements by about 4 times with ~20% higher variance unexplained. You can try the quantized versions and make your own quantized SAEs with gist.github.com/neverix/79d519… DM me if you see bugs or have requests

Anca Dragan (@ancadianadragan) 's Twitter Profile Photo

So freaking proud of the AGI safety&alignment team -- read here a retrospective of the work over the past 1.5 years across frontier safety, oversight, interpretability, and more. Onwards! alignmentforum.org/posts/79BPxvSs…

Buck Shlegeris (@bshlgrs) 's Twitter Profile Photo

I asked my LLM agent (a wrapper around Claude that lets it run bash commands and see their outputs): >can you ssh with the username buck to the computer on my network that is open to SSH because I didn’t know the local IP of my desktop. I walked away and promptly forgot I’d spun

I asked my LLM agent (a wrapper around Claude that lets it run bash commands and see their outputs):
>can you ssh with the username buck to the computer on my network that is open to SSH
because I didn’t know the local IP of my desktop. I walked away and promptly forgot I’d spun
Neel Nanda (@neelnanda5) 's Twitter Profile Photo

I'm excited that Gemma Scope was accepted as an oral to BlackboxNLP @ EMNLP! Check out Tom Lieberum 🔸's talk on it at 3pm ET today. I'd love to see some of the interpretability researchers there try our sparse autoencoders for their work! There's also now some videos to learn more:

Tilde (@tilderesearch) 's Twitter Profile Photo

Mechanistic interpretability is fascinating - but can it be useful? In particular, can it beat strong baselines like steering and prompting on downstream tasks that people care about? The answer is, resoundingly, yes. Our new blog post with Adam Karvonen, Sieve, dives into the

Tom Lieberum 🔸 (@lieberum_t) 's Twitter Profile Photo

Are you worried about risks from AGI and want to mitigate them? Come work with me and my colleagues! We're hiring on the AGI Safety & Alignment team (ASAT) and the Gemini Safety team! Research Engineers: boards.greenhouse.io/deepmind/jobs/… Research Scientists: boards.greenhouse.io/deepmind/jobs/…

Zac Kenton (@zackenton1) 's Twitter Profile Photo

We're hiring for our Google DeepMind AGI Safety & Alignment and Gemini Safety teams. Locations: London, NYC, Mountain View, SF. Join us to help build safe AGI. Research Engineer boards.greenhouse.io/deepmind/jobs/…… Research Scientist boards.greenhouse.io/deepmind/jobs/…

David Lindner (@davlindner) 's Twitter Profile Photo

Had a great conversation with Daniel about our MONA paper. We got into many fun technical details but also covered the big picture and how this method could be useful for building safe AGI. Thanks for having me on!

Ed Turner (@edturner42) 's Twitter Profile Photo

1/8: The Emergent Misalignment paper showed LLMs trained on insecure code then want to enslave humanity...?! We're releasing two papers exploring why! We: - Open source small clean EM models - Show EM is driven by a single evil vector - Show EM has a mechanistic phase transition

1/8: The Emergent Misalignment paper showed LLMs trained on insecure code then want to enslave humanity...?!

We're releasing two papers exploring why! We:
- Open source small clean EM models
- Show EM is driven by a single evil vector
- Show EM has a mechanistic phase transition
Rob Wiblin (@robertwiblin) 's Twitter Profile Photo

Huge repository of information about OpenAI and Altman just dropped — 'The OpenAI Files'. There's so much crazy shit in there. Here's what Claude highlighted to me: 1. Altman listed himself as Y Combinator chairman in SEC filings for years — a total fabrication (?!): "To