Justin Zhao (@justinxzhao) Twitter Tweets • TwiCopy

Justin Zhao

@justinxzhao

+ Follow

On a career break! Previously ML Lead @Predibase, R-SWE @GoogleAI, CS/Music @Columbia. Tweeting about AI, evals, synthetic data, and my side projects.

ID: 1931326447

linkhttp://justinxzhao.com calendar_today03-10-2013 16:32:33

162 Tweet

278 Followers

334 Following

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

People think that automating jobs will be easy, but they're wrong. You can’t just ask the AI to do things. You need to understand what your employee is doing - instructions, evals, monitoring. You have to make the role legible. Only then can you know AI will do the job well.

thumb_up_off_alt490

chat_bubble_outline56

repeat25

shareShare

Justin Zhao

@justinxzhao

4 months ago

Give LLMs access to free entropy, and they will kinda make use of it. 💭

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Siddharth Ramakrishnan

@siddharthvader_

4 months ago

another installment of non-determinism evals with Justin Zhao ! we ran an experiment with claude, making 100 API calls per query to test consistency with numerical data like population figures, GDP, and measurements. results below were interesting

thumb_up_off_alt1

chat_bubble_outline1

repeat1

shareShare

Justin Zhao

@justinxzhao

4 months ago

Feel bad for all the people who actually write with em dashes — only to get accused of AI slop.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Amanda Cercas Curry

@curriedamanda

3 months ago

🚨🚨🚨 Justin Zhao just presented our paper and is around #NAACL2025 if you want to have a chat!!

thumb_up_off_alt12

chat_bubble_outline0

repeat4

shareShare

Alex Dimakis

@alexgdimakis

2 months ago

"RL with only one training example" and "Test-Time RL" are two recent papers that I found fascinating. In the "One Training example" paper the authors find one question and ask the model to solve it again and again. Every time, the model tries 8 times (the Group in GRPO), and

thumb_up_off_alt1,1K

chat_bubble_outline38

repeat190

shareShare

Justin Zhao

@justinxzhao

2 months ago

Love the idea of presenting your work as someone else's as a way of getting past sycophancy, which seems to be getting worse these days. I suppose most LLM-as-a-Judge setups embody this inherently, presenting outputs for rating as those written by anonymous third parties.

thumb_up_off_alt2

chat_bubble_outline1

repeat0

shareShare

Justin Zhao

@justinxzhao

a month ago

🥳🥳

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Justin Zhao

@justinxzhao

25 days ago

"AI as Normal Technology" knightcolumbia.org/content/ai-as-… also advocates the idea that the impact of superintelligence will be extremely gradual because knowing how to improve requires 1) implementing and 2) getting feedback from the real world, both of which are slow.

thumb_up_off_alt1

chat_bubble_outline1

repeat0

shareShare

Justin Zhao

@justinxzhao

18 days ago

Editing your reward function is like publishing an amendment to your model's constitution.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Justin Zhao

@justinxzhao

16 days ago

Big sister energy. Congratulations!

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Sam Paech

@sam_paech

12 days ago

Kimi-K2 just took top spot on both EQ-Bench3 and Creative Writing! Another win for open models. Incredible job Kimi.ai

Kimi-K2 just took top spot on both EQ-Bench3 and Creative Writing!

Another win for open models. Incredible job <a href="/Kimi_Moonshot/">Kimi.ai</a>

thumb_up_off_alt711

chat_bubble_outline28

repeat88

shareShare

Justin Zhao

@justinxzhao

3 days ago

In a world with AI, doing isn’t the hard part anymore. The hard part is trusting. Reviewing. Verifying. Embracing. Deciding what matters. These are the bottlenecks now, and they are deeply human. We talk about AI agents accelerating science and automating research, and they

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Justin Zhao

Gate.io

Bob McGrew

Justin Zhao

Siddharth Ramakrishnan

Justin Zhao

Amanda Cercas Curry

Alex Dimakis

Justin Zhao

Justin Zhao

Justin Zhao

Justin Zhao

Justin Zhao

Sam Paech

Justin Zhao