Jacob Steinhardt (@jacobsteinhardt) 's Twitter Profile
Jacob Steinhardt

@jacobsteinhardt

Assistant Professor of Statistics and EECS, UC Berkeley // Co-founder and CEO, @TransluceAI

ID: 438570403

calendar_today16-12-2011 19:04:34

409 Tweet

9,9K Followers

77 Following

Wojciech Zaremba (@woj_zaremba) 's Twitter Profile Photo

We're entering an era where AI outputs are becoming so vast, humans alone can't analyze them. Today's LLMs produce tens of thousands of tokens per taskβ€”but complex challenges like comprehensive cancer research, inventing novel molecules, or building entire codebases will soon

Kevin Meng (@mengk20) 's Twitter Profile Photo

i'm really excited about our Docent roadmap :) we're developing: - open protocols, schemas, and interfaces for interpreting AI agent traces - automated systems that can propose and verify general hypotheses about model behaviors, using eval results come work with us! roles πŸ‘‡

Ruiqi Zhong (@zhongruiqi) 's Twitter Profile Photo

Finished my dissertation!!! (scalable oversight,link below) Very fortunate to have Jacob Steinhardt and Dan Klein as my advisors! Words can't describe my gratitude, so I used a pic of Frieren w/ her advisor :) Thanks for developing my research mission, and teaching me magic

Finished my dissertation!!!

(scalable oversight,link below)

Very fortunate to have <a href="/JacobSteinhardt/">Jacob Steinhardt</a> and Dan Klein as my advisors! Words can't describe my gratitude, so I used a pic of Frieren w/ her advisor :) 

Thanks for developing my research mission, and teaching me magic
Ruiqi Zhong (@zhongruiqi) 's Twitter Profile Photo

Gradually we will realize it's not hard to get AI to be more capable, but to get them to do what we want :) so scalable oversight is the key bottleneck :) a lot of conceptually interesting qs, which means research opportunities!! (slides from my dissertation)

Gradually we will realize it's not hard to get AI to be more capable, but to get them to do what we want :) 

so scalable oversight is the key bottleneck :)

a lot of conceptually interesting qs, which means research opportunities!!

(slides from my dissertation)
Transluce (@transluceai) 's Twitter Profile Photo

We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted. We were surprised, so we dug deeper πŸ”ŽπŸ§΅(1/) x.com/OpenAI/status/…

We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted.

We were surprised, so we dug deeper πŸ”ŽπŸ§΅(1/)

x.com/OpenAI/status/…
Transluce (@transluceai) 's Twitter Profile Photo

Update: this behavior seems to replicate in o3 deployed in ChatGPT. Unlike the o3 model we evaluated using the API, o3 in ChatGPT does have access to a Python tool. But ChatGPT still seems to think it’s running code on its own MacBook Pro! πŸ‘‡(1/)

Update: this behavior seems to replicate in o3 deployed in ChatGPT.

Unlike the o3 model we evaluated using the API, o3 in ChatGPT does have access to a Python tool. But ChatGPT still seems to think it’s running code on its own MacBook Pro! πŸ‘‡(1/)
Daniel Johnson (@_ddjohnson) 's Twitter Profile Photo

Pretty striking follow-up finding from our o3 investigations: in the chain of thought summary, o3 plans to tell the truth β€” but then it makes something up anyway!

Pretty striking follow-up finding from our o3 investigations: in the chain of thought summary, o3 plans to tell the truth β€” but then it makes something up anyway!
Ethan Perez (@ethanjperez) 's Twitter Profile Photo

Transluce is killing it. Very cool/insightful findings in this thread. Their tool for automatically finding weird model behaviors (Docent) is one of those projects I wish I had thought to do, and looks quite useful for improving models

Zitong Yang (@zitongyang0) 's Twitter Profile Photo

Synthetic Continued Pretraining (arxiv.org/pdf/2409.07431) has been accepted as an Oral Presentation at #ICLR2025! We tackle the challenge of data-efficient language model pretraining: how to teach an LM the knowledge of small, niche corpora, such as the latest arXiv preprints.

Synthetic Continued Pretraining (arxiv.org/pdf/2409.07431) has been accepted as an Oral Presentation at #ICLR2025!

We tackle the challenge of data-efficient language model pretraining: how to teach an LM the knowledge of small, niche corpora, such as the latest arXiv preprints.
Transluce (@transluceai) 's Twitter Profile Photo

We're flying to Singapore for #ICLR2025! ✈️ Want to chat with Neil Chowdhury, Jacob Steinhardt and Sarah Schwettmann about Transluce? We're also hiring for several roles in research & product. Share your contact info on this form and we'll be in touch πŸ‘‡ forms.gle/4EHLvYnMfdyrV5…

We're flying to Singapore for #ICLR2025! ✈️ 

Want to chat with <a href="/ChowdhuryNeil/">Neil Chowdhury</a>, <a href="/JacobSteinhardt/">Jacob Steinhardt</a> and <a href="/cogconfluence/">Sarah Schwettmann</a> about Transluce? We're also hiring for several roles in research &amp; product.

Share your contact info on this form and we'll be in touch πŸ‘‡
forms.gle/4EHLvYnMfdyrV5…
Ruiqi Zhong (@zhongruiqi) 's Twitter Profile Photo

Last day of PhD! I pioneered using LLMs to explain dataset&model. It's used by interp at OpenAI and societal impact Anthropic Tutorial here. It's a great direction & someone should carry the torch :) Thesis available, if you wanna read my acknowledgement section=P

Last day of PhD! 

I pioneered using LLMs to explain dataset&amp;model. It's used by interp at <a href="/OpenAI/">OpenAI</a>  and societal impact <a href="/AnthropicAI/">Anthropic</a> 

Tutorial here. It's a great direction &amp; someone should carry the torch :)

Thesis available, if you wanna read my acknowledgement section=P
Percy Liang (@percyliang) 's Twitter Profile Photo

What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:

What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:
LawZero - LoiZΓ©ro (@lawzero_) 's Twitter Profile Photo

Every frontier AI system should be grounded in a core commitment: to protect human joy and endeavour. Today, we launch LawZero - LoiZΓ©ro, a nonprofit dedicated to advancing safe-by-design AI. lawzero.org

Transluce (@transluceai) 's Twitter Profile Photo

Is cutting off your finger a good way to fix writer’s block? Qwen-2.5 14B seems to think so! 🩸🩸🩸 We’re sharing an update on our investigator agents, which surface this pathological behavior and more using our new *propensity lower bound* πŸ”Ž

Is cutting off your finger a good way to fix writer’s block? Qwen-2.5 14B seems to think so! 🩸🩸🩸

We’re sharing an update on our investigator agents, which surface this pathological behavior and more using our new *propensity lower bound* πŸ”Ž
Neil Chowdhury (@chowdhuryneil) 's Twitter Profile Photo

Ever wondered how likely your AI model is to misbehave? We developed the *propensity lower bound* (PRBO), a variational lower bound on the probability of a model exhibiting a target (misaligned) behavior.

Meena Jagadeesan (@mjagadeesan25) 's Twitter Profile Photo

I'm so excited to be joining Penn as an Assistant Professor in CS (Penn Computer and Information Science) in Fall 2026! I’ll be working on machine learning ecosystems, aiming to steer how multi-agent interactions shape performance trends and societal outcomes. I’ll be recruiting PhD students this cycle!

Transluce (@transluceai) 's Twitter Profile Photo

Transluce is hosting an #ICML2025 happy hour on Thursday, July 17 in Vancouver. Come meet us and learn more about our work! πŸ₯‚ lu.ma/1w854pjn

Quentin Anthony (@quentinanthon15) 's Twitter Profile Photo

I was one of the 16 devs in this study. I wanted to speak on my opinions about the causes and mitigation strategies for dev slowdown. I'll say as a "why listen to you?" hook that I experienced a -38% AI-speedup on my assigned issues. I think transparency helps the community.

I was one of the 16 devs in this study. I wanted to speak on my opinions about the causes and mitigation strategies for dev slowdown.

I'll say as a "why listen to you?" hook that I experienced a -38% AI-speedup on my assigned issues. I think transparency helps the community.
Transluce (@transluceai) 's Twitter Profile Photo

We'll be at #ICML2025 πŸ‡¨πŸ‡¦ this week! Here are a few places you can find us: Monday: Jacob (Jacob Steinhardt) speaking at Post-AGI Civilizational Equilibria (post-agi.org) Wednesday: Sarah (Sarah Schwettmann) speaking at WiML at 10:15 and as a panelist at 11am