Daniel Murfet (@danielmurfet) 's Twitter Profile
Daniel Murfet

@danielmurfet

Mathematician at the University of Melbourne. Working on Singular Learning Theory and AI alignment.

ID: 617213120

linkhttp://www.therisingsea.org calendar_today24-06-2012 15:26:49

3,3K Tweet

851 Followers

518 Following

davidad 🎇 (@davidad) 's Twitter Profile Photo

At 🇬🇧ARIA, we’re serious about catalysing a new paradigm for AI deployment—techniques to safely *contain* powerful AI (instead of “making it safe”), especially for improving the performance and resilience of critical infrastructure. This needs a new org. Want to be its founder?

At 🇬🇧ARIA, we’re serious about catalysing a new paradigm for AI deployment—techniques to safely *contain* powerful AI (instead of “making it safe”), especially for improving the performance and resilience of critical infrastructure.

This needs a new org.

Want to be its founder?
Christopher Potts (@chrisgpotts) 's Twitter Profile Photo

For a Goodfire/Anthropic meet-up later this month, I wrote a discussion doc: Assessing skeptical views of interpretability research Spoiler: it's an incredible moment for interpetability research. The skeptical views sound like a call to action to me. Link just below.

Tom Burns (@tfburns) 's Twitter Profile Photo

Could the key to more efficient & robust language models come from computational neuroscience? Our paper demonstrates how brain-inspired architectures can enhance in-context learning in Transformers and LLMs. (1/15)

Could the key to more efficient & robust language models come from computational neuroscience? Our paper demonstrates how brain-inspired architectures can enhance in-context learning in Transformers and LLMs. (1/15)
Alex Strick van Linschoten (@strickvl) 's Twitter Profile Photo

In parallel I'd been exploring how to make LLMs tangible, i.e. as physical artifacts, not just plots. I started a small project to 'knit' a model in the physical word by mapping token probabilities/attention/layer interactions into a 20×20, three-colour pattern, then render it in

In parallel I'd been exploring how to make LLMs tangible, i.e. as physical artifacts, not just plots. I started a small project to 'knit' a model in the physical word by mapping token probabilities/attention/layer interactions into a 20×20, three-colour pattern, then render it in
Pratyush Maini (@pratyushmaini) 's Twitter Profile Photo

1/Pretraining is hitting a data wall; scaling raw web data alone leads to diminishing returns. Today DatologyAI shares BeyondWeb, our synthetic data approach & all the learnings from scaling it to trillions of tokens🧑🏼‍🍳 - 3B LLMs beat 8B models🚀 - Pareto frontier for performance

1/Pretraining is hitting a data wall; scaling raw web data alone leads to diminishing returns. Today <a href="/datologyai/">DatologyAI</a> shares BeyondWeb, our synthetic data approach &amp; all the learnings from scaling it to trillions of tokens🧑🏼‍🍳
- 3B LLMs beat 8B models🚀
- Pareto frontier for performance
Jim Halverson (@jhhalverson) 's Twitter Profile Photo

Grateful to Simons Foundation for their support of the Physics of Learning, and glad to be a part of this collaboration! Excited to see many breakthroughs in the coming years.

Greg Jefferis (@gsxej) 's Twitter Profile Photo

Neuronal diversity is written in transcriptional codes 🧬. But what is the logic of these codes that define cell types and wiring patterns? To find out we built a #scRNAseq developmental atlas of the Drosophila nerve cord and linked it to the #connectome 🪰🧠 Tweeprint! ⬇️1/8

Neuronal diversity is written in transcriptional codes 🧬. But what is the logic of these codes that define cell types and wiring patterns?
To find out we built a #scRNAseq developmental atlas of the Drosophila nerve cord and linked it to the #connectome 🪰🧠
Tweeprint! ⬇️1/8
Goodfire (@goodfireai) 's Twitter Profile Photo

(6/7) Of course, a full solution also requires tools to mitigate those behaviors once they've been identified - and we're building those, e.g. via behavior steering. We think interp will be core to this - and more broadly, to debugging training for alignment and reliability!

Tom McGrath (@banburismus_) 's Twitter Profile Photo

post-training is weird, and can have all sorts of surprising side effects - extreme sycophancy, hallucinations, mechahitler... what can we do? we have a great new technique for surfacing unexpected behaviours during finetuning that might help!

Marcus Hutter (@mhutter42) 's Twitter Profile Photo

Reflective-Oracle AIXI solves the Grain of Truth problem for super-intelligent multi-agent systems/societies. Finally the long-awaited more comprehensive treatment building upon earlier work from last decade is out. Slides: hutter1.net/publ/sgot.pdf Paper: arxiv.org/abs/2508.16245

Reflective-Oracle AIXI solves the Grain of Truth problem for super-intelligent multi-agent systems/societies. Finally the long-awaited more comprehensive treatment building upon earlier work from last decade is out. Slides: hutter1.net/publ/sgot.pdf Paper: arxiv.org/abs/2508.16245
Daniel Filan (@dfrsrchtwts) 's Twitter Profile Photo

yearn to contemplate the platonic forms? captivated by the geometry of balls rolling down valleys something something rainbow serpent something something cell biology? apply to work with Daniel Murfet and Jesse Hoogland in the Winter MATS cohort by Oct 2.

Joshua Batson (@thebasepoint) 's Twitter Profile Photo

This is a neat approach to attribution! It leaves open a question that we couldn't answer either: how to properly attribute through attention *patterns* to features, in a "relevance"/"influence"-spirited way.

Miles Brundage (@miles_brundage) 's Twitter Profile Photo

It's not 100% clear what would count as sufficient evidence that a restructured OpenAI would serve the nonprofit mission. But excellent safety practices + a binding commitment to credible, ongoing external assurance thereof does seem like a bare minimum. x.com/GarrisonLovely…

Brenden Lake (@lakebrenden) 's Twitter Profile Photo

Our new lab for Human & Machine Intelligence is officially open at Princeton University! Consider applying for a PhD or Postdoc position, either through the depts. of Computer Science or Psychology. You can register interest on our new website lake-lab.github.io (1/2)

Our new lab for Human &amp; Machine Intelligence is officially open at Princeton University!

Consider applying for a PhD or Postdoc position, either through the depts. of Computer Science or Psychology. You can register interest on our new website lake-lab.github.io (1/2)
Eric J. Michaud (@ericjmichaud_) 's Twitter Profile Photo

During my summer at Goodfire, I ended up thinking a bit about sparse autoencoder scaling laws, and whether the existence of "feature manifolds" could impact SAE scaling behavior, with Liv and Tom McGrath 🙏: arxiv.org/abs/2509.02565