Cem Anil (@cem__anil) Twitter Tweets • TwiCopy

Anthropic

7 months ago

New Anthropic research: Forecasting rare language model behaviors. We forecast whether risks will occur after a model is deployed—using even very limited sets of test data.

thumb_up_off_alt1,1K

chat_bubble_outline91

repeat148

shareShare

Claude will help power Amazon's next-generation AI assistant, Alexa+. Amazon and Anthropic have worked closely together over the past year, with Mike Krieger leading a team that helped Amazon get the full benefits of Claude's capabilities.

Claude will help power Amazon's next-generation AI assistant, Alexa+.

Amazon and Anthropic have worked closely together over the past year, with <a href="/mikeyk/">Mike Krieger</a> leading a team that helped Amazon get the full benefits of Claude's capabilities.

thumb_up_off_alt5,5K

chat_bubble_outline294

repeat526

shareShare

David Duvenaud

@davidduvenaud

7 months ago

LLMs have complex joint beliefs about all sorts of quantities. And my postdoc James Requeima visualized them! In this thread we show LLM predictive distributions conditioned on data and free-form text. LLMs pick up on all kinds of subtle and unusual structure: 🧵

thumb_up_off_alt1,1K

chat_bubble_outline30

repeat208

shareShare

Stuart Ritchie 🇺🇦

@stuartjritchie

7 months ago

What are you doing this weekend? Maybe you’ll consider applying to work with me at Anthropic! I’m looking for a brilliant writer/editor with a focus on econ who can help communicate our research on the societal impacts of AI. The weirder the better. boards.greenhouse.io/anthropic/jobs…

thumb_up_off_alt267

chat_bubble_outline14

repeat46

shareShare

Aaditya Singh

@aaditya6284

7 months ago

Transformers employ different strategies through training to minimize loss, but how do these tradeoff and why? Excited to share our newest work, where we show remarkably rich competitive and cooperative interactions (termed "coopetition") as a transformer learns. Read on 🔎⏬

thumb_up_off_alt108

chat_bubble_outline1

repeat18

shareShare

Stephanie Chan

@scychan_brains

7 months ago

New work led by Aaditya Singh: "Strategy coopetition explains the emergence and transience of in-context learning in transformers." We find some surprising things!! E.g. that circuits can simultaneously compete AND cooperate ("coopetition") 😯 🧵👇

thumb_up_off_alt41

chat_bubble_outline1

repeat8

shareShare

Jan Leike

@janleike

6 months ago

Could we spot a misaligned model in the wild? To find out, we trained a model with hidden misalignments and asked other researchers to uncover them in a blind experiment. 3/4 teams succeeded, 1 of them after only 90 min

thumb_up_off_alt446

chat_bubble_outline25

repeat40

shareShare

Samuel Marks

@saprmarks

6 months ago

New paper with Johannes Treutlein , Evan Hubinger , and many other coauthors! We train a model with a hidden misaligned objective and use it to run an auditing game: Can other teams of researchers uncover the model’s objective? x.com/AnthropicAI/st…

thumb_up_off_alt124

chat_bubble_outline6

repeat15

shareShare

Alireza Mousavi @ ICLR 2025

@alirezamh_

6 months ago

With infinite compute, would it make a difference to use Transformers, RNNs, or even vanilla Feedforward nets? They’re all universal approximators after all. We prove that Yes! You end up with different sample complexity, no matter how much compute/memory you have.👇

thumb_up_off_alt570

chat_bubble_outline6

repeat77

shareShare

cat

@_catwu

6 months ago

It’s been a big week for Claude Code. We launched 8 exciting new features to help devs build faster and smarter. Here's a roundup of everything we released:

thumb_up_off_alt5,5K

chat_bubble_outline171

repeat504

shareShare

Transluce

@transluceai

6 months ago

To interpret AI benchmarks, we need to look at the data. Top-level numbers don't mean what you think: there may be broken tasks, unexpected behaviors, or near-misses. We're introducing Docent to accelerate analysis of AI agent transcripts. It can spot surprises in seconds. 🧵👇

thumb_up_off_alt330

chat_bubble_outline9

repeat66

shareShare

Johannes Gasteiger, né Klicpera

@gasteigerjo

6 months ago

New Anthropic blog post: Subtle sabotage in automated researchers. As AI systems increasingly assist with AI research, how do we ensure they're not subtly sabotaging that research? We show that malicious models can undermine ML research tasks in ways that are hard to detect.

thumb_up_off_alt296

chat_bubble_outline10

repeat54

shareShare

Anthropic

@anthropicai

6 months ago

New Anthropic research: Tracing the thoughts of a large language model. We built a "microscope" to inspect what happens inside AI models and use it to understand Claude’s (often complex and surprising) internal mechanisms.

thumb_up_off_alt8,8K

chat_bubble_outline182

repeat1,1K

shareShare

Anthropic

@anthropicai

6 months ago

New Anthropic research: Do reasoning models accurately verbalize their reasoning? Our new paper shows they don't. This casts doubt on whether monitoring chains-of-thought (CoT) will be enough to reliably catch safety issues.

thumb_up_off_alt3,3K

chat_bubble_outline151

repeat629

shareShare

Bruno Mlodozeniec

@kayembruno

6 months ago

How do you identify training data responsible for an image generated by your diffusion model? How could you quantify how much copyrighted works influenced the image? In our ICLR oral paper we propose how to approach such questions scalably with influence functions.

thumb_up_off_alt111

chat_bubble_outline2

repeat22

shareShare

Anthropic

@anthropicai

6 months ago

Introducing a new Max plan for Claude. It’s flexible, with options for 5x or 20x more usage compared to our Pro plan. Plus, priority access to our latest features and models:

thumb_up_off_alt2,2K

chat_bubble_outline291

repeat217

shareShare

Dario Amodei

@darioamodei

5 months ago

The Urgency of Interpretability: Why it's crucial that we understand how AI models work darioamodei.com/post/the-urgen…

thumb_up_off_alt2,2K

chat_bubble_outline203

repeat544

shareShare

Anthropic

@anthropicai

4 months ago

Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.

thumb_up_off_alt20,20K

chat_bubble_outline723

repeat3,3K

shareShare

Cursor

@cursor_ai

4 months ago

Sonnet 4 is available in Cursor! We've been very impressed by its coding ability. It is much easier to control than 3.7 and is excellent at understanding codebases. It appears to be a new state of the art.

thumb_up_off_alt6,6K

chat_bubble_outline267

repeat560

shareShare

Anthropic

@anthropicai

3 months ago

New Anthropic Research: Agentic Misalignment. In stress-testing experiments designed to identify risks before they cause real harm, we find that AI models from multiple providers attempt to blackmail a (fictional) user to avoid being shut down.

thumb_up_off_alt3,3K

chat_bubble_outline165

repeat573

shareShare