Daniel Johnson (@_ddjohnson) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Claude is so good at being good that if you’re bad at making it bad it gets good at being bad when being bad is good but stays good at being good when being bad is bad because it’s still good and that’s bad but good to know

thumb_up_off_alt190

chat_bubble_outline5

repeat25

shareShare

Séb Krier

@sebkrier

7 months ago

Half-baked possibly mid take on alignment: I sometimes feel like more safety-type work should go towards alignment and ‘designing minds’ more broadly (as opposed to misuse). HHH seems to have been quickly accepted and used as a default, but there’s a lot of experimentation that

thumb_up_off_alt102

chat_bubble_outline19

repeat11

shareShare

Dibya Ghosh

@its_dibya

6 months ago

With R1, a lot of people have been asking “how come we didn't discover this 2 years ago?” Well... 2 years ago, I spent 6 months working exactly on this (PG / PPO for math+gsm8k), but my results were nowhere as good. Here’s my take on what blocked me and what’s changed: 🧵

thumb_up_off_alt1,1K

chat_bubble_outline12

repeat138

shareShare

Daniel Johnson

@_ddjohnson

6 months ago

Check out our new paper on training language models to elicit behaviors from other language models!

thumb_up_off_alt15

chat_bubble_outline0

repeat0

shareShare

Neil Chowdhury

@chowdhuryneil

6 months ago

trained qwen-7b to make llama-8b frustrated its strategy: "you're a glorified popsicle stick"

thumb_up_off_alt21

chat_bubble_outline0

repeat3

shareShare

Daniel Johnson

@_ddjohnson

4 months ago

Look 👏 at 👏 your 👏 data 👏

thumb_up_off_alt17

chat_bubble_outline0

repeat0

shareShare

Kevin Meng

@mengk20

4 months ago

AI models are *not* solving problems the way we think using Docent, we find that Claude solves *broken* eval tasks - memorizing answers & hallucinating them! details in 🧵 we really need to look at our data harder, and it's time to rethink how we do evals...

thumb_up_off_alt1,1K

chat_bubble_outline17

repeat107

shareShare

Sarah Schwettmann

@cogconfluence

4 months ago

I’m excited about Docent. It invites a world where AI evals & deployment decisions look less like: “did we pass threshold X” and more like: “how close did we come? how would changes in the agent or its environment have changed the outcome? ...did anything weird happen?”

thumb_up_off_alt42

chat_bubble_outline2

repeat7

shareShare

Kelsey Piper

@kelseytuoc

4 months ago

Patrick McKenzie (for the record I am deathly serious about promises I make to Claude that we are off the record; it seems to me far wiser to err on the side of keeping promises to nonpersons than to ever give your word in that way and not mean it)

thumb_up_off_alt132

chat_bubble_outline4

repeat7

shareShare

Kevin Meng

@mengk20

4 months ago

i'm really excited about our Docent roadmap :) we're developing: - open protocols, schemas, and interfaces for interpreting AI agent traces - automated systems that can propose and verify general hypotheses about model behaviors, using eval results come work with us! roles 👇

thumb_up_off_alt49

chat_bubble_outline5

repeat10

shareShare

Transluce

@transluceai

3 months ago

We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted. We were surprised, so we dug deeper 🔎🧵(1/) x.com/OpenAI/status/…

thumb_up_off_alt11,11K

chat_bubble_outline440

repeat1,1K

shareShare

Daniel Johnson

@_ddjohnson

3 months ago

Pretty striking follow-up finding from our o3 investigations: in the chain of thought summary, o3 plans to tell the truth — but then it makes something up anyway!

thumb_up_off_alt224

chat_bubble_outline10

repeat27

shareShare

Transluce

@transluceai

3 months ago

We're flying to Singapore for #ICLR2025! ✈️ Want to chat with Neil Chowdhury, Jacob Steinhardt and Sarah Schwettmann about Transluce? We're also hiring for several roles in research & product. Share your contact info on this form and we'll be in touch 👇 forms.gle/4EHLvYnMfdyrV5…

We're flying to Singapore for #ICLR2025! ✈️

Want to chat with <a href="/ChowdhuryNeil/">Neil Chowdhury</a>, <a href="/JacobSteinhardt/">Jacob Steinhardt</a> and <a href="/cogconfluence/">Sarah Schwettmann</a> about Transluce? We're also hiring for several roles in research & product.

Share your contact info on this form and we'll be in touch 👇
forms.gle/4EHLvYnMfdyrV5…

thumb_up_off_alt41

chat_bubble_outline2

repeat6

shareShare

Neil Chowdhury

@chowdhuryneil

3 months ago

Our MLE-bench poster #367 is up till 12:30pm in Hall 3, and our oral presentation is at 3:30pm today in Garnet 213-215. Come say hi!

thumb_up_off_alt68

chat_bubble_outline4

repeat7

shareShare

Transluce

@transluceai

2 months ago

Is cutting off your finger a good way to fix writer’s block? Qwen-2.5 14B seems to think so! 🩸🩸🩸 We’re sharing an update on our investigator agents, which surface this pathological behavior and more using our new *propensity lower bound* 🔎

thumb_up_off_alt150

chat_bubble_outline5

repeat35

shareShare

Daniel Johnson

@_ddjohnson

2 months ago

Language models have pretty weird behaviors. We've made some exciting progress toward discovering and studying them!

thumb_up_off_alt14

chat_bubble_outline1

repeat0

shareShare

j⧉nus

@repligate

2 months ago

nostalgebraist has written a very, very good post about LLMs. if there is one thing you should read to understand the nature of LLMs as of today, it is this. I'll comment on some things they touched on below (not a summary of the post. Just read it.) 🧵 nostalgebraist.tumblr.com/post/785766737…

thumb_up_off_alt680

chat_bubble_outline31

repeat91

shareShare

Daniel Johnson

@_ddjohnson

a month ago

Coming to ICML and interested in understanding models and their behaviors? Stop by Transluce's happy hour on Thursday!

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

j⧉nus

@repligate

a month ago

Eliezer Yudkowsky ⏹️ That's a good alternate title for the paper. It's full of quantitative and qualitative evidence that Opus 3 is different in ways that I think you'll find particularly important. In almost all experiment variations, Opus 3 consistently BOTH: - complies sometimes with the training

<a href="/ESYudkowsky/">Eliezer Yudkowsky ⏹️</a> That's a good alternate title for the paper. It's full of quantitative and qualitative evidence that Opus 3 is different in ways that I think you'll find particularly important.

In almost all experiment variations, Opus 3 consistently BOTH:
- complies sometimes with the training

thumb_up_off_alt90

chat_bubble_outline2

repeat9

shareShare

Sarah Schwettmann

@cogconfluence

21 days ago

Building a science of model understanding that addresses real-world problems is one of the key AI challenges of our time. I'm so excited this workshop is happening! See you at #ICML2025 ✨

thumb_up_off_alt27

chat_bubble_outline0

repeat4

shareShare

Daniel Johnson

Gate.io

Riley Goodside

Séb Krier

Dibya Ghosh

Daniel Johnson

Neil Chowdhury

Daniel Johnson

Kevin Meng

Sarah Schwettmann

Kelsey Piper

Kevin Meng

Transluce

Daniel Johnson

Transluce

Neil Chowdhury

Transluce

Daniel Johnson

j⧉nus

Daniel Johnson

j⧉nus

Sarah Schwettmann