Chinmay Deshpande (@chinmay_deshp) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Today’s report on AI Governance in CA builds on the urgent conversations around AI governance we began in the Legislature last year. I thank Fei-Fei Li, Jennifer Chayes, and Tino Cuellar for the hard work and keen insight they provide in this urgent report.

Today’s report on AI Governance in CA builds on the urgent conversations around AI governance we began in the Legislature last year. I thank <a href="/drfeifei/">Fei-Fei Li</a>, <a href="/jenniferchayes/">Jennifer Chayes</a>, and Tino Cuellar for the hard work and keen insight they provide in this urgent report.

thumb_up_off_alt49

chat_bubble_outline25

repeat14

shareShare

METR

@metr_evals

5 months ago

When will AI systems be able to carry out long projects independently? In new research, we find a kind of “Moore’s Law for AI agents”: the length of tasks that AIs can do is doubling about every 7 months.

thumb_up_off_alt4,4K

chat_bubble_outline158

repeat826

shareShare

Kevin Meng

@mengk20

4 months ago

AI models are *not* solving problems the way we think using Docent, we find that Claude solves *broken* eval tasks - memorizing answers & hallucinating them! details in 🧵 we really need to look at our data harder, and it's time to rethink how we do evals...

thumb_up_off_alt1,1K

chat_bubble_outline17

repeat107

shareShare

Shakeel

@shakeelhashim

4 months ago

NEW on Transformer: Where's Gemini 2.5 Pro's system card? Google's previously promised governments that it will publish such information — but with the latest model, it's nowhere to be found.

thumb_up_off_alt82

chat_bubble_outline2

repeat11

shareShare

Kevin Bankston

@kevinbankston

4 months ago

Notable also that OpenAI's Deep Research system card came out over two weeks after the model dropped. Good rule of thumb: if your AI lab is releasing models too fast to even keep up with your own voluntary transparency commitments, then you are moving too fast period.

thumb_up_off_alt39

chat_bubble_outline1

repeat8

shareShare

Kevin Bankston

@kevinbankston

4 months ago

Even the most basic AI transparency is falling to the wayside in the competitive crush. OpenAI puts out a model card weeks late for Deep Research; Google doesn’t publish any at all for Gemini 2 or 2.5; now Meta puts out the shortest vaguest one I’ve ever seen with Llama 4. Sad!

thumb_up_off_alt82

chat_bubble_outline3

repeat13

shareShare

Thomas Woodside 🫜

@thomas_woodside

4 months ago

"Especially to the extent AI developers continue to stumble in these commitments, it will be incumbent on lawmakers to develop and enforce clear transparency requirements that the companies can’t shirk." -- Kevin Bankston Agree! The time for these requirements is now.

thumb_up_off_alt55

chat_bubble_outline2

repeat5

shareShare

Dylan HadfieldMenell

@dhadfieldmenell

4 months ago

I remember talking about competitive pressures and race conditions with the OpenAI’s safety team in 2018 when I was an intern. It was part of a larger conversation about the company charter. It is sad to see OpenAI’s founding principles cave to pressures we predicted long ago.

thumb_up_off_alt145

chat_bubble_outline4

repeat4

shareShare

Russell Brandom

@russellbrandom

3 months ago

In MIT Technology Review, I wrote about the crisis in AI evaluations — and why a new focus on validity could be the best way forward

In <a href="/techreview/">MIT Technology Review</a>, I wrote about the crisis in AI evaluations — and why a new focus on validity could be the best way forward

thumb_up_off_alt33

chat_bubble_outline2

repeat8

shareShare

Kevin Wei (he/they)

@kevinlwei

a month ago

🚨 New paper alert! 🚨 Are human baselines rigorous enough to support claims about "superhuman" performance? Spoiler alert: often not! Patricia Paskov and I will be presenting our spotlight paper at ICML next week on the state of human baselines + how to improve them!

🚨 New paper alert! 🚨

Are human baselines rigorous enough to support claims about "superhuman" performance?

Spoiler alert: often not!

<a href="/prpaskov/">Patricia Paskov</a> and I will be presenting our spotlight paper at ICML next week on the state of human baselines + how to improve them!

thumb_up_off_alt17

chat_bubble_outline1

repeat7

shareShare

Kevin Bankston

@kevinbankston

19 days ago

Think the upshot here is that we should certainly leverage chain of thought as low-hanging fruit for safety alignment but also continue to invest in other more direct methods of interpretability since CoT can be unreliable. Yes and, not either or.

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Chinmay Deshpande

Gate.io

Senator Scott Wiener

METR

Kevin Meng

Shakeel

Kevin Bankston

Kevin Bankston

Thomas Woodside 🫜

Dylan HadfieldMenell

Russell Brandom

Kevin Wei (he/they)

Kevin Bankston