Justin Zhao (@justinxzhao) 's Twitter Profile
Justin Zhao

@justinxzhao

On a career break! Previously ML Lead @Predibase, R-SWE @GoogleAI, CS/Music @Columbia. Tweeting about AI, evals, synthetic data, and my side projects.

ID: 1931326447

linkhttp://justinxzhao.com calendar_today03-10-2013 16:32:33

162 Tweet

278 Followers

334 Following

Bob McGrew (@bobmcgrewai) 's Twitter Profile Photo

People think that automating jobs will be easy, but they're wrong. You can’t just ask the AI to do things. You need to understand what your employee is doing - instructions, evals, monitoring. You have to make the role legible. Only then can you know AI will do the job well.

Siddharth Ramakrishnan (@siddharthvader_) 's Twitter Profile Photo

another installment of non-determinism evals with Justin Zhao ! we ran an experiment with claude, making 100 API calls per query to test consistency with numerical data like population figures, GDP, and measurements. results below were interesting

Alex Dimakis (@alexgdimakis) 's Twitter Profile Photo

"RL with only one training example" and "Test-Time RL" are two recent papers that I found fascinating. In the "One Training example" paper the authors find one question and ask the model to solve it again and again. Every time, the model tries 8 times (the Group in GRPO), and

"RL with only one training example" and "Test-Time RL" are two recent papers that I found fascinating. 

In the "One Training example" paper 
the authors find one question and ask the model to solve it again and again. Every time, the model tries 8 times (the Group in GRPO), and
Justin Zhao (@justinxzhao) 's Twitter Profile Photo

Love the idea of presenting your work as someone else's as a way of getting past sycophancy, which seems to be getting worse these days. I suppose most LLM-as-a-Judge setups embody this inherently, presenting outputs for rating as those written by anonymous third parties.

Justin Zhao (@justinxzhao) 's Twitter Profile Photo

"AI as Normal Technology" knightcolumbia.org/content/ai-as-… also advocates the idea that the impact of superintelligence will be extremely gradual because knowing how to improve requires 1) implementing and 2) getting feedback from the real world, both of which are slow.

Justin Zhao (@justinxzhao) 's Twitter Profile Photo

In a world with AI, doing isn’t the hard part anymore. The hard part is trusting. Reviewing. Verifying. Embracing. Deciding what matters. These are the bottlenecks now, and they are deeply human. We talk about AI agents accelerating science and automating research, and they