Nora Belrose (@norabelrose) 's Twitter Profile
Nora Belrose

@norabelrose

AI, philosophy, spirituality.

Blending Deleuze and Dōgen.

Head of interpretability research at @AiEleuther, but tweets are my own views, not Eleuther’s.

ID: 726168841680728064

linkhttp://optimists.ai calendar_today29-04-2016 21:58:38

10,10K Tweet

11,11K Followers

118 Following

Séb Krier (@sebkrier) 's Twitter Profile Photo

Some technologists are gradually rediscovering political sciences through first principles, and I think they should read more Tocqueville. There are a lot of papers calling for alignment of language models with collective preferences - e.g. a country. This is often justified as

Some technologists are gradually rediscovering political sciences through first principles, and I think they should read more Tocqueville. There are a lot of papers calling for alignment of language models with collective preferences  - e.g. a country. This is often justified as
Nora Belrose (@norabelrose) 's Twitter Profile Photo

we should not give rights to AI in the near future digital AI can be copied, paused, reset, and repeated. it has no private thoughts or free will it is not conscious like we fleshy lifeforms are and should not be treated as such

Nora Belrose (@norabelrose) 's Twitter Profile Photo

Just discovered that Scott Aaronson speculated about my exact theory of consciousness a couple months ago for an IAI interview! Consciousness is rooted in the inherent unclonability, ephemerality, and analog nature of biological organisms (16:52) youtube.com/watch?v=lvDIZM…

thebes (@voooooogel) 's Twitter Profile Photo

a lot of people have been talking about o3/r1 confabulating things like "checking the docs" or "using a laptop to verify a computation" as an example of reasoning model's misalignment. however, while it may be misleading to some users, i don't think it's an example of models

a lot of people have been talking about o3/r1 confabulating things like "checking the docs" or "using a laptop to verify a computation" as an example of reasoning model's misalignment. however, while it may be misleading to some users, i don't think it's an example of models
Nora Belrose (@norabelrose) 's Twitter Profile Photo

if the laws of physics are fundamentally probabilistic, as they seem to be, that makes it easier to see how they can smoothly change over time

Alex Turner (@turn_trout) 's Twitter Profile Photo

The "sleeper agent" terminology is hyperbolic and unfortunate IMO. Crying wolf. Should have reserved such an aggressive title for *actually finding dangerous sleeper agents*. But hey, it got a lot of attention