Daniel Murfet (@danielmurfet) Twitter Tweets • TwiCopy

davidad 🎇

7 months ago

At 🇬🇧ARIA, we’re serious about catalysing a new paradigm for AI deployment—techniques to safely *contain* powerful AI (instead of “making it safe”), especially for improving the performance and resilience of critical infrastructure. This needs a new org. Want to be its founder?

thumb_up_off_alt248

chat_bubble_outline22

repeat28

shareShare

Christopher Potts

@chrisgpotts

3 months ago

For a Goodfire/Anthropic meet-up later this month, I wrote a discussion doc: Assessing skeptical views of interpretability research Spoiler: it's an incredible moment for interpetability research. The skeptical views sound like a call to action to me. Link just below.

thumb_up_off_alt299

chat_bubble_outline8

repeat23

shareShare

Tom Burns

@tfburns

3 months ago

Could the key to more efficient & robust language models come from computational neuroscience? Our paper demonstrates how brain-inspired architectures can enhance in-context learning in Transformers and LLMs. (1/15)

thumb_up_off_alt13

chat_bubble_outline1

repeat2

shareShare

Chris Olah

@ch402

3 months ago

Our interpretability team is planning to mentor more fellows this cycle! Applications are due Aug 17.

thumb_up_off_alt310

chat_bubble_outline10

repeat18

shareShare

Alex Strick van Linschoten

@strickvl

3 months ago

In parallel I'd been exploring how to make LLMs tangible, i.e. as physical artifacts, not just plots. I started a small project to 'knit' a model in the physical word by mapping token probabilities/attention/layer interactions into a 20×20, three-colour pattern, then render it in

thumb_up_off_alt7

chat_bubble_outline1

repeat3

shareShare

Pratyush Maini

@pratyushmaini

3 months ago

1/Pretraining is hitting a data wall; scaling raw web data alone leads to diminishing returns. Today DatologyAI shares BeyondWeb, our synthetic data approach & all the learnings from scaling it to trillions of tokens🧑🏼‍🍳 - 3B LLMs beat 8B models🚀 - Pareto frontier for performance

1/Pretraining is hitting a data wall; scaling raw web data alone leads to diminishing returns. Today <a href="/datologyai/">DatologyAI</a> shares BeyondWeb, our synthetic data approach & all the learnings from scaling it to trillions of tokens🧑🏼‍🍳
- 3B LLMs beat 8B models🚀
- Pareto frontier for performance

thumb_up_off_alt559

chat_bubble_outline18

repeat92

shareShare

Jim Halverson

@jhhalverson

3 months ago

Grateful to Simons Foundation for their support of the Physics of Learning, and glad to be a part of this collaboration! Excited to see many breakthroughs in the coming years.

thumb_up_off_alt15

chat_bubble_outline0

repeat1

shareShare

Greg Jefferis

@gsxej

3 months ago

Neuronal diversity is written in transcriptional codes 🧬. But what is the logic of these codes that define cell types and wiring patterns? To find out we built a #scRNAseq developmental atlas of the Drosophila nerve cord and linked it to the #connectome 🪰🧠 Tweeprint! ⬇️1/8

thumb_up_off_alt133

chat_bubble_outline3

repeat29

shareShare

Goodfire

@goodfireai

3 months ago

(6/7) Of course, a full solution also requires tools to mitigate those behaviors once they've been identified - and we're building those, e.g. via behavior steering. We think interp will be core to this - and more broadly, to debugging training for alignment and reliability!

thumb_up_off_alt34

chat_bubble_outline1

repeat1

shareShare

algebraic geometer (derogatory)

@d_m_d_m_d_d

3 months ago

calculation of global sections of line bundles on projective varieties

thumb_up_off_alt102

chat_bubble_outline1

repeat12

shareShare

Tom McGrath

@banburismus_

3 months ago

post-training is weird, and can have all sorts of surprising side effects - extreme sycophancy, hallucinations, mechahitler... what can we do? we have a great new technique for surfacing unexpected behaviours during finetuning that might help!

thumb_up_off_alt68

chat_bubble_outline1

repeat4

shareShare

Marcus Hutter

@mhutter42

3 months ago

Reflective-Oracle AIXI solves the Grain of Truth problem for super-intelligent multi-agent systems/societies. Finally the long-awaited more comprehensive treatment building upon earlier work from last decade is out. Slides: hutter1.net/publ/sgot.pdf Paper: arxiv.org/abs/2508.16245

thumb_up_off_alt73

chat_bubble_outline8

repeat12

shareShare

Daniel Filan

@dfrsrchtwts

2 months ago

yearn to contemplate the platonic forms? captivated by the geometry of balls rolling down valleys something something rainbow serpent something something cell biology? apply to work with Daniel Murfet and Jesse Hoogland in the Winter MATS cohort by Oct 2.

thumb_up_off_alt17

chat_bubble_outline2

repeat1

shareShare

Joshua Batson

@thebasepoint

2 months ago

This is a neat approach to attribution! It leaves open a question that we couldn't answer either: how to properly attribute through attention *patterns* to features, in a "relevance"/"influence"-spirited way.

thumb_up_off_alt24

chat_bubble_outline3

repeat1

shareShare

Miles Brundage

@miles_brundage

2 months ago

It's not 100% clear what would count as sufficient evidence that a restructured OpenAI would serve the nonprofit mission. But excellent safety practices + a binding commitment to credible, ongoing external assurance thereof does seem like a bare minimum. x.com/GarrisonLovely…

thumb_up_off_alt9

chat_bubble_outline1

repeat1

shareShare

Brenden Lake

@lakebrenden

2 months ago

Our new lab for Human & Machine Intelligence is officially open at Princeton University! Consider applying for a PhD or Postdoc position, either through the depts. of Computer Science or Psychology. You can register interest on our new website lake-lab.github.io (1/2)

thumb_up_off_alt583

chat_bubble_outline11

repeat59

shareShare

Eric J. Michaud

@ericjmichaud_

2 months ago

During my summer at Goodfire, I ended up thinking a bit about sparse autoencoder scaling laws, and whether the existence of "feature manifolds" could impact SAE scaling behavior, with Liv and Tom McGrath 🙏: arxiv.org/abs/2509.02565

thumb_up_off_alt130

chat_bubble_outline4

repeat8

shareShare