Jacob Steinhardt (@jacobsteinhardt) Twitter Tweets • TwiCopy

Wojciech Zaremba

8 months ago

We're entering an era where AI outputs are becoming so vast, humans alone can't analyze them. Today's LLMs produce tens of thousands of tokens per task—but complex challenges like comprehensive cancer research, inventing novel molecules, or building entire codebases will soon

thumb_up_off_alt323

chat_bubble_outline24

repeat38

shareShare

Kevin Meng

@mengk20

8 months ago

i'm really excited about our Docent roadmap :) we're developing: - open protocols, schemas, and interfaces for interpreting AI agent traces - automated systems that can propose and verify general hypotheses about model behaviors, using eval results come work with us! roles 👇

thumb_up_off_alt49

chat_bubble_outline5

repeat10

shareShare

Sarah Schwettmann

@cogconfluence

8 months ago

these are pretty special roles, I can't recommend working with Kevin Meng, vincent and the rest of the Transluce team enough 🫡 come join us! 👇

thumb_up_off_alt16

chat_bubble_outline0

repeat3

shareShare

Ruiqi Zhong

@zhongruiqi

7 months ago

Finished my dissertation!!! (scalable oversight,link below) Very fortunate to have Jacob Steinhardt and Dan Klein as my advisors! Words can't describe my gratitude, so I used a pic of Frieren w/ her advisor :) Thanks for developing my research mission, and teaching me magic

Finished my dissertation!!!

(scalable oversight,link below)

Very fortunate to have <a href="/JacobSteinhardt/">Jacob Steinhardt</a> and Dan Klein as my advisors! Words can't describe my gratitude, so I used a pic of Frieren w/ her advisor :)

Thanks for developing my research mission, and teaching me magic

thumb_up_off_alt393

chat_bubble_outline27

repeat9

shareShare

Ruiqi Zhong

@zhongruiqi

7 months ago

Gradually we will realize it's not hard to get AI to be more capable, but to get them to do what we want :) so scalable oversight is the key bottleneck :) a lot of conceptually interesting qs, which means research opportunities!! (slides from my dissertation)

thumb_up_off_alt110

chat_bubble_outline0

repeat11

shareShare

Transluce

@transluceai

7 months ago

We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted. We were surprised, so we dug deeper 🔎🧵(1/) x.com/OpenAI/status/…

thumb_up_off_alt11,11K

chat_bubble_outline440

repeat1,1K

shareShare

Transluce

@transluceai

7 months ago

Update: this behavior seems to replicate in o3 deployed in ChatGPT. Unlike the o3 model we evaluated using the API, o3 in ChatGPT does have access to a Python tool. But ChatGPT still seems to think it’s running code on its own MacBook Pro! 👇(1/)

thumb_up_off_alt158

chat_bubble_outline5

repeat12

shareShare

Daniel Johnson

@_ddjohnson

7 months ago

Pretty striking follow-up finding from our o3 investigations: in the chain of thought summary, o3 plans to tell the truth — but then it makes something up anyway!

thumb_up_off_alt224

chat_bubble_outline10

repeat27

shareShare

Ethan Perez

@ethanjperez

7 months ago

Transluce is killing it. Very cool/insightful findings in this thread. Their tool for automatically finding weird model behaviors (Docent) is one of those projects I wish I had thought to do, and looks quite useful for improving models

thumb_up_off_alt58

chat_bubble_outline0

repeat5

shareShare

Zitong Yang

@zitongyang0

7 months ago

Synthetic Continued Pretraining (arxiv.org/pdf/2409.07431) has been accepted as an Oral Presentation at #ICLR2025! We tackle the challenge of data-efficient language model pretraining: how to teach an LM the knowledge of small, niche corpora, such as the latest arXiv preprints.

thumb_up_off_alt82

chat_bubble_outline1

repeat12

shareShare

Transluce

@transluceai

7 months ago

We're flying to Singapore for #ICLR2025! ✈️ Want to chat with Neil Chowdhury, Jacob Steinhardt and Sarah Schwettmann about Transluce? We're also hiring for several roles in research & product. Share your contact info on this form and we'll be in touch 👇 forms.gle/4EHLvYnMfdyrV5…

We're flying to Singapore for #ICLR2025! ✈️

Want to chat with <a href="/ChowdhuryNeil/">Neil Chowdhury</a>, <a href="/JacobSteinhardt/">Jacob Steinhardt</a> and <a href="/cogconfluence/">Sarah Schwettmann</a> about Transluce? We're also hiring for several roles in research & product.

Share your contact info on this form and we'll be in touch 👇
forms.gle/4EHLvYnMfdyrV5…

thumb_up_off_alt41

chat_bubble_outline2

repeat6

shareShare

Ruiqi Zhong

@zhongruiqi

6 months ago

Last day of PhD! I pioneered using LLMs to explain dataset&model. It's used by interp at OpenAI and societal impact Anthropic Tutorial here. It's a great direction & someone should carry the torch :) Thesis available, if you wanna read my acknowledgement section=P

Last day of PhD!

I pioneered using LLMs to explain dataset&model. It's used by interp at <a href="/OpenAI/">OpenAI</a> and societal impact <a href="/AnthropicAI/">Anthropic</a>

Tutorial here. It's a great direction & someone should carry the torch :)

Thesis available, if you wanna read my acknowledgement section=P

thumb_up_off_alt523

chat_bubble_outline27

repeat37

shareShare

Percy Liang

@percyliang

6 months ago

What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:

thumb_up_off_alt939

chat_bubble_outline39

repeat185

shareShare

LawZero - LoiZéro

@lawzero_

5 months ago

Every frontier AI system should be grounded in a core commitment: to protect human joy and endeavour. Today, we launch LawZero - LoiZéro, a nonprofit dedicated to advancing safe-by-design AI. lawzero.org

thumb_up_off_alt277

chat_bubble_outline23

repeat75

shareShare

Transluce

@transluceai

5 months ago

Is cutting off your finger a good way to fix writer’s block? Qwen-2.5 14B seems to think so! 🩸🩸🩸 We’re sharing an update on our investigator agents, which surface this pathological behavior and more using our new *propensity lower bound* 🔎

thumb_up_off_alt150

chat_bubble_outline5

repeat35

shareShare

Neil Chowdhury

@chowdhuryneil

5 months ago

Ever wondered how likely your AI model is to misbehave? We developed the *propensity lower bound* (PRBO), a variational lower bound on the probability of a model exhibiting a target (misaligned) behavior.

thumb_up_off_alt38

chat_bubble_outline1

repeat3

shareShare

Meena Jagadeesan

@mjagadeesan25

5 months ago

I'm so excited to be joining Penn as an Assistant Professor in CS (Penn Computer and Information Science) in Fall 2026! I’ll be working on machine learning ecosystems, aiming to steer how multi-agent interactions shape performance trends and societal outcomes. I’ll be recruiting PhD students this cycle!

thumb_up_off_alt786

chat_bubble_outline39

repeat52

shareShare

Transluce

@transluceai

5 months ago

Transluce is hosting an #ICML2025 happy hour on Thursday, July 17 in Vancouver. Come meet us and learn more about our work! 🥂 lu.ma/1w854pjn

thumb_up_off_alt38

chat_bubble_outline1

repeat7

shareShare

Quentin Anthony

@quentinanthon15

4 months ago

I was one of the 16 devs in this study. I wanted to speak on my opinions about the causes and mitigation strategies for dev slowdown. I'll say as a "why listen to you?" hook that I experienced a -38% AI-speedup on my assigned issues. I think transparency helps the community.

thumb_up_off_alt3,3K

chat_bubble_outline98

repeat423

shareShare

Transluce

@transluceai

4 months ago

We'll be at #ICML2025 🇨🇦 this week! Here are a few places you can find us: Monday: Jacob (Jacob Steinhardt) speaking at Post-AGI Civilizational Equilibria (post-agi.org) Wednesday: Sarah (Sarah Schwettmann) speaking at WiML at 10:15 and as a panelist at 11am

thumb_up_off_alt41

chat_bubble_outline1

repeat7

shareShare