Nora Belrose (@norabelrose) Twitter Tweets • TwiCopy

Nora Belrose

@norabelrose

+ Follow

AI, philosophy, spirituality.

Blending Deleuze and Dōgen.

Head of interpretability research at @AiEleuther, but tweets are my own views, not Eleuther’s.

ID: 726168841680728064

linkhttp://optimists.ai calendar_today29-04-2016 21:58:38

10,10K Tweet

11,11K Followers

118 Following

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Some technologists are gradually rediscovering political sciences through first principles, and I think they should read more Tocqueville. There are a lot of papers calling for alignment of language models with collective preferences - e.g. a country. This is often justified as

thumb_up_off_alt450

chat_bubble_outline36

repeat69

shareShare

Nora Belrose

@norabelrose

4 months ago

we should not give rights to AI in the near future digital AI can be copied, paused, reset, and repeated. it has no private thoughts or free will it is not conscious like we fleshy lifeforms are and should not be treated as such

thumb_up_off_alt279

chat_bubble_outline69

repeat11

shareShare

Nora Belrose

@norabelrose

4 months ago

Just discovered that Scott Aaronson speculated about my exact theory of consciousness a couple months ago for an IAI interview! Consciousness is rooted in the inherent unclonability, ephemerality, and analog nature of biological organisms (16:52) youtube.com/watch?v=lvDIZM…

thumb_up_off_alt110

chat_bubble_outline15

repeat10

shareShare

Nora Belrose

@norabelrose

3 months ago

thumb_up_off_alt15

chat_bubble_outline0

repeat0

shareShare

thebes

@voooooogel

3 months ago

a lot of people have been talking about o3/r1 confabulating things like "checking the docs" or "using a laptop to verify a computation" as an example of reasoning model's misalignment. however, while it may be misleading to some users, i don't think it's an example of models

thumb_up_off_alt691

chat_bubble_outline30

repeat68

shareShare

Nora Belrose

@norabelrose

3 months ago

Vesak procession in Lumbini, Nepal (the Buddha's birthplace)

thumb_up_off_alt17

chat_bubble_outline0

repeat2

shareShare

Nora Belrose

@norabelrose

3 months ago

data attribution is the most neglected thing in interpretability and people should join me in working on it

thumb_up_off_alt153

chat_bubble_outline15

repeat4

shareShare

Nora Belrose

@norabelrose

2 months ago

if the laws of physics are fundamentally probabilistic, as they seem to be, that makes it easier to see how they can smoothly change over time

thumb_up_off_alt45

chat_bubble_outline10

repeat1

shareShare

Alex Turner

@turn_trout

a month ago

The "sleeper agent" terminology is hyperbolic and unfortunate IMO. Crying wolf. Should have reserved such an aggressive title for *actually finding dangerous sleeper agents*. But hey, it got a lot of attention

thumb_up_off_alt43

chat_bubble_outline3

repeat4

shareShare

Nora Belrose

Gate.io

Séb Krier

Nora Belrose

Nora Belrose

Nora Belrose

thebes

Nora Belrose

Nora Belrose

Nora Belrose

Alex Turner