Minh Le (@minhxle1) 's Twitter Profile
Minh Le

@minhxle1

AI Safety Fellow @Anthropic
serial startup engineer turned AI researcher

ID: 1313006553495031809

calendar_today05-10-2020 06:42:20

28 Tweet

112 Followers

214 Following

Adam Eagle (@adamreagle) 's Twitter Profile Photo

Beam is hiring a BizOps leader to join the team in SF! We’re looking for a generalist with strength in GTM to help incubate critical functions and drive strategic product initiatives. Read more and apply here: bit.ly/3CByG3G

Jason Clavelli (@jclavelli) 's Twitter Profile Photo

My company Beam is hiring a 4th engineer. You might like our team if you - notice every little thing slowing down eng and get annoyed - think construction is at least kind of interesting - have played factorio (optional) DM if interested!

Owain Evans (@owainevans_uk) 's Twitter Profile Photo

Our new paper: Emergent misalignment extends to *reasoning* LLMs. Training on narrow harmful tasks causes broad misalignment. Reasoning models sometimes resist being shut down and plot deception against users in their chain-of-thought (despite no such training)🧵

Our new paper: Emergent misalignment extends to *reasoning* LLMs.
Training on narrow harmful tasks causes broad misalignment.
Reasoning models sometimes resist being shut down and plot deception against users in their chain-of-thought (despite no such training)🧵
Owain Evans (@owainevans_uk) 's Twitter Profile Photo

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵

New paper & surprising result.
LLMs transmit traits to other models via hidden signals in data.
Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵
Anthropic (@anthropicai) 's Twitter Profile Photo

In a joint paper with Owain Evans as part of the Anthropic Fellows Program, we study a surprising phenomenon: subliminal learning. Language models can transmit their traits to other models, even in what appears to be meaningless data. x.com/OwainEvans_UK/…

Samuel Marks (@saprmarks) 's Twitter Profile Photo

Subliminal learning: training on model-generated data can transmit traits of that model, even if the data is unrelated. Think: "You can learn physics by watching Einstein do yoga" I'll discuss how this introduces a surprising pitfall for AI developers 🧵x.com/OwainEvans_UK/…

Anthropic (@anthropicai) 's Twitter Profile Photo

New Anthropic research: Persona vectors. Language models sometimes go haywire and slip into weird and unsettling personas. Why? In a new paper, we find “persona vectors"—neural activity patterns controlling traits like evil, sycophancy, or hallucination.

New Anthropic research: Persona vectors.

Language models sometimes go haywire and slip into weird and unsettling personas. Why? In a new paper, we find “persona vectors"—neural activity patterns controlling traits like evil, sycophancy, or hallucination.