Tom Lieberum 🔸 (@lieberum_t) Twitter Tweets • TwiCopy

Tom Lieberum 🔸

@lieberum_t

+ Follow

Trying to reduce AGI x-risk by understanding NNs

Interpretability RE @DeepMind
BSc Physics from @RWTH
10% pledgee @ givingwhatwecan.org

ID: 1263043707684696064

calendar_today20-05-2020 09:47:52

944 Tweet

1,1K Followers

194 Following

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Extremely excited to finally get this into people's hands! Huge achievement by the whole mechinterp team Google DeepMind! Also thanks for the amazing support to the Gemma team and the people at neuronpedia for the absolutely stunning interactive demo!

thumb_up_off_alt25

chat_bubble_outline2

repeat3

shareShare

nev

@neverrixx

a year ago

We quantized all Gemma Scopes into 4 bits, reducing memory and storage requirements by about 4 times with ~20% higher variance unexplained. You can try the quantized versions and make your own quantized SAEs with gist.github.com/neverix/79d519… DM me if you see bugs or have requests

thumb_up_off_alt28

chat_bubble_outline1

repeat3

shareShare

Anca Dragan

@ancadianadragan

a year ago

So freaking proud of the AGI safety&alignment team -- read here a retrospective of the work over the past 1.5 years across frontier safety, oversight, interpretability, and more. Onwards! alignmentforum.org/posts/79BPxvSs…

thumb_up_off_alt326

chat_bubble_outline7

repeat62

shareShare

Buck Shlegeris

@bshlgrs

10 months ago

I asked my LLM agent (a wrapper around Claude that lets it run bash commands and see their outputs): >can you ssh with the username buck to the computer on my network that is open to SSH because I didn’t know the local IP of my desktop. I walked away and promptly forgot I’d spun

thumb_up_off_alt5,5K

chat_bubble_outline150

repeat466

shareShare

Neel Nanda

@neelnanda5

8 months ago

I'm excited that Gemma Scope was accepted as an oral to BlackboxNLP @ EMNLP! Check out Tom Lieberum 🔸's talk on it at 3pm ET today. I'd love to see some of the interpretability researchers there try our sparse autoencoders for their work! There's also now some videos to learn more:

thumb_up_off_alt122

chat_bubble_outline2

repeat10

shareShare

Tilde

@tilderesearch

7 months ago

Mechanistic interpretability is fascinating - but can it be useful? In particular, can it beat strong baselines like steering and prompting on downstream tasks that people care about? The answer is, resoundingly, yes. Our new blog post with Adam Karvonen, Sieve, dives into the

thumb_up_off_alt219

chat_bubble_outline9

repeat31

shareShare

Tom Lieberum 🔸

@lieberum_t

6 months ago

Are you worried about risks from AGI and want to mitigate them? Come work with me and my colleagues! We're hiring on the AGI Safety & Alignment team (ASAT) and the Gemini Safety team! Research Engineers: boards.greenhouse.io/deepmind/jobs/… Research Scientists: boards.greenhouse.io/deepmind/jobs/…

thumb_up_off_alt141

chat_bubble_outline4

repeat12

shareShare

Zac Kenton

@zackenton1

6 months ago

We're hiring for our Google DeepMind AGI Safety & Alignment and Gemini Safety teams. Locations: London, NYC, Mountain View, SF. Join us to help build safe AGI. Research Engineer boards.greenhouse.io/deepmind/jobs/…… Research Scientist boards.greenhouse.io/deepmind/jobs/…

thumb_up_off_alt280

chat_bubble_outline4

repeat36

shareShare

Tom Lieberum 🔸

@lieberum_t

3 months ago

Now remember kids, never forget to split your keys.

thumb_up_off_alt3

chat_bubble_outline1

repeat0

shareShare

Tom Lieberum 🔸

@lieberum_t

2 months ago

grugbrain.dev grug no able see complexity demon, but grug sense presence in code base, very dangerous

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

David Lindner

@davlindner

a month ago

Had a great conversation with Daniel about our MONA paper. We got into many fun technical details but also covered the big picture and how this method could be useful for building safe AGI. Thanks for having me on!

thumb_up_off_alt57

chat_bubble_outline0

repeat3

shareShare

Ed Turner

@edturner42

a month ago

1/8: The Emergent Misalignment paper showed LLMs trained on insecure code then want to enslave humanity...?! We're releasing two papers exploring why! We: - Open source small clean EM models - Show EM is driven by a single evil vector - Show EM has a mechanistic phase transition

thumb_up_off_alt226

chat_bubble_outline15

repeat42

shareShare

Rob Wiblin

@robertwiblin

a month ago

Huge repository of information about OpenAI and Altman just dropped — 'The OpenAI Files'. There's so much crazy shit in there. Here's what Claude highlighted to me: 1. Altman listed himself as Y Combinator chairman in SEC filings for years — a total fabrication (?!): "To

thumb_up_off_alt10,10K

chat_bubble_outline850

repeat2,2K

shareShare