Daniela Amodei (@danielaamodei) Twitter Tweets • TwiCopy

Anthropic

@anthropicai

4 years ago

Hello world! You can read our launch announcement here: anthropic.com/news/announcem…

thumb_up_off_alt290

chat_bubble_outline4

repeat34

shareShare

Anthropic

@anthropicai

4 years ago

Our first AI alignment paper, focused on simple baselines and investigations: A General Language Assistant as a Laboratory for Alignment arxiv.org/abs/2112.00861

thumb_up_off_alt324

chat_bubble_outline5

repeat60

shareShare

Our first interpretability paper explores a mathematical framework for trying to reverse engineer transformer language models: A Mathematical Framework for Transformer Circuits: transformer-circuits.pub/2021/framework…

thumb_up_off_alt610

chat_bubble_outline3

repeat115

shareShare

Anthropic

@anthropicai

4 years ago

Our first societal impacts paper explores the technical traits of large generative models and the motivations and challenges people face in building and deploying them: arxiv.org/abs/2202.07785

thumb_up_off_alt148

chat_bubble_outline2

repeat33

shareShare

Anthropic

@anthropicai

4 years ago

In our second interpretability paper, we revisit “induction heads”. In 2+ layer transformers these pattern-completion heads form exactly when in-context learning abruptly improves. Are they responsible for most in-context learning in large transformers? transformer-circuits.pub/2022/in-contex…

thumb_up_off_alt307

chat_bubble_outline1

repeat57

shareShare

Anthropic

@anthropicai

4 years ago

On the @FLIxrisk podcast, we discuss AI research, AI safety, and what it was like starting Anthropic during COVID. futureoflife.org/2022/03/04/dan…

thumb_up_off_alt49

chat_bubble_outline3

repeat9

shareShare

Anthropic

@anthropicai

4 years ago

We've trained a natural language assistant to be more helpful and harmless by using reinforcement learning with human feedback (RLHF). arxiv.org/abs/2204.05862

thumb_up_off_alt269

chat_bubble_outline3

repeat51

shareShare

Anthropic

@anthropicai

4 years ago

Glad Quanta Magazine highlights progress on induction heads/rigorous interpretability by Chris Olah, Catherine Olsson, @nelhage and others Anthropic. More to come! quantamagazine.org/researchers-gl…

thumb_up_off_alt76

chat_bubble_outline0

repeat11

shareShare

Anthropic

@anthropicai

3 years ago

In a new paper, we show that repeating only a small fraction of the data used to train a language model (albeit many times) can damage performance significantly, and we observe a "double descent" phenomenon associated with this. arxiv.org/abs/2205.10487

$In a new paper, we show that repeating only a small fraction of the data used to train a language model (albeit many times) can damage performance significantly, and we observe a "double descent" phenomenon associated with this. arxiv.org/abs/2205.10487$

thumb_up_off_alt324

chat_bubble_outline6

repeat40

shareShare

Anthropic

@anthropicai

3 years ago

Transformer MLP neurons are challenging to understand. We find that using a different activation function (Softmax Linear Units or SoLU) increases the fraction of neurons that appear to respond to understandable features without any performance penalty. transformer-circuits.pub/2022/solu/inde…

$Transformer MLP neurons are challenging to understand. We find that using a different activation function (Softmax Linear Units or SoLU) increases the fraction of neurons that appear to respond to understandable features without any performance penalty. transformer-circuits.pub/2022/solu/inde…$

thumb_up_off_alt373

chat_bubble_outline10

repeat69

shareShare

Anthropic

@anthropicai

3 years ago

In "Language Models (Mostly) Know What They Know", we show that language models can evaluate whether what they say is true, and predict ahead of time whether they'll be able to answer questions correctly. arxiv.org/abs/2207.05221

thumb_up_off_alt912

chat_bubble_outline19

repeat153

shareShare

Anthropic

@anthropicai

3 years ago

Neural networks often pack many unrelated concepts into a single neuron – a puzzling phenomenon known as 'polysemanticity' which makes interpretability much more challenging. In our latest work, we build toy models where the origins of polysemanticity can be fully understood.

thumb_up_off_alt3,3K

chat_bubble_outline54

repeat637

shareShare

Anthropic

@anthropicai

2 years ago

Introducing Claude 2! Our latest model has improved performance in coding, math and reasoning. It can produce longer responses, and is available in a new public-facing beta website at claude.ai in the US and UK.

thumb_up_off_alt2,2K

chat_bubble_outline162

repeat503

shareShare