Jack Merullo (@jack_merullo_) 's Twitter Profile
Jack Merullo

@jack_merullo_

Interpretability @GoodfireAI Phd Student @BrownUniversity Previously @allen_ai and @GoogleAI

ID: 1547342006128586752

linkhttps://jmerullo.github.io/ calendar_today13-07-2022 22:08:02

118 Tweet

673 Followers

294 Following

Goodfire (@goodfireai) 's Twitter Profile Photo

We are excited to announce our collaboration with Arc Institute on their state-of-the-art biological foundation model, Evo 2. Our work reveals how models like Evo 2 process biological information - from DNA to proteins - in ways we can now decode.

We are excited to announce our collaboration with <a href="/arcinstitute/">Arc Institute</a> on their state-of-the-art biological foundation model, Evo 2. Our work reveals how models like Evo 2 process biological information - from DNA to proteins - in ways we can now decode.
William Merrill (@lambdaviking) 's Twitter Profile Photo

How does the depth of a transformer affect reasoning capabilities? New preprint by myself and Ashish Sabharwal shows that a little depth goes a long way to increase transformers’ expressive power We take this as encouraging for further research on looped transformers!🧵

How does the depth of a transformer affect reasoning capabilities? New preprint by myself and <a href="/Ashish_S_AI/">Ashish Sabharwal</a> shows that a little depth goes a long way to increase transformers’ expressive power

We take this as encouraging for further research on looped transformers!🧵
Apoorv Khandelwal (@apoorvkh) 's Twitter Profile Photo

We made a library (torchrunx) to make multi-GPU / multi-node PyTorch easier, more robust, and more modular! 🧵 github.com/apoorvkh/torch… Docs: torchrun.xyz `(uv) pip install torchrunx` today! (w/ the very talented, Peter Curtin, Brown CS '25)

Thariq (@trq212) 's Twitter Profile Photo

✨ New AI Interfaces powered by Interpretability I'm excited to share LatentLit, the result of my applied AI research fellowship with Goodfire Mechanistic interpretability isn’t just important for AI safety, it also gives us new ways to steer and interact with LLMs.

Tom McGrath (@banburismus_) 's Twitter Profile Photo

I’m a bit confused by this - perhaps due to differences of opinion in what ‘fundamental SAE research’ is and what interpretability is for. This is why I prefer to talk about interpreter models rather than SAEs - we’re attached to the end goal, not the details of methodology. The

David Bau (@davidbau) 's Twitter Profile Photo

GDM's AGI safety document is great and worth a read. But their focus on *robustness* of technology neglects *resilience* of the larger ecosystem. To build resilience, we need to empower people, and a build third way between open and closed models. resilience.baulab.info/docs/NDIF-resi…

Jack Merullo (@jack_merullo_) 's Twitter Profile Photo

I joined Goodfire a little over a month ago to do interpretability! I am really excited to extend my work beyond just LMs. I think interp has a lot to offer to e.g., scientific models. Understanding them might actually teach us something new about the world 🌎

Goodfire (@goodfireai) 's Twitter Profile Photo

What goes on inside the mind of a reasoning model? Today we're releasing the first open-source sparse autoencoders (SAEs) trained on DeepSeek's 671B parameter reasoning model, R1—giving us new tools to understand and steer model thinking. Why does this matter?

What goes on inside the mind of a reasoning model? Today we're releasing the first open-source sparse autoencoders (SAEs) trained on DeepSeek's 671B parameter reasoning model, R1—giving us new tools to understand and steer model thinking.

Why does this matter?
Suraj Anand ICLR 2025 (@surajk610) 's Twitter Profile Photo

Excited to be at #ICLR2025 in a few days to present this work with Michael Lepori! Interested in chatting about training dynamics, mechinterp, memory-efficient training, info theory or anything else! Please dm me.

Benjamin Spiegel (@superspeeg) 's Twitter Profile Photo

Why did only humans invent graphical systems like writing? 🧠✍️ In our new paper at CogSci Society, we explore how agents learn to communicate using a model of pictographic signification similar to human proto-writing. 🧵👇

Aaron Mueller (@amuuueller) 's Twitter Profile Photo

Lots of progress in mech interp (MI) lately! But how can we measure when new mech interp methods yield real improvements over prior work? We propose 😎 𝗠𝗜𝗕: a Mechanistic Interpretability Benchmark!

Lots of progress in mech interp (MI) lately! But how can we measure when new mech interp methods yield real improvements over prior work?

We propose 😎 𝗠𝗜𝗕: a Mechanistic Interpretability Benchmark!
Goodfire (@goodfireai) 's Twitter Profile Photo

We created a canvas that plugs into an image model’s brain. You can use it to generate images in real-time by painting with the latent concepts the model has learned. Try out Paint with Ember for yourself 👇