Lawrence Chan (@justanotherlaw) 's Twitter Profile
Lawrence Chan

@justanotherlaw

I do AI Alignment Research. Currently at @METR_Evals on leave from my PhD at UC Berkeley’s @CHAI_berkeley. Opinions are my own.

ID: 824308056351735809

linkhttps://chanlawrence.me/ calendar_today25-01-2017 17:28:50

434 Tweet

1,1K Followers

158 Following

Lawrence Chan (@justanotherlaw) 's Twitter Profile Photo

Putting aside what this means for ML automation or agency overhang for a second, I was really impressed by Tao Lin's work here. Working basically alone, and using ~$50k in token + compute costs and 4 weeks of total engineering time, he substantially advance SOTA on

METR (@metr_evals) 's Twitter Profile Photo

When will AI systems be able to carry out long projects independently? In new research, we find a kind of “Moore’s Law for AI agents”: the length of tasks that AIs can do is doubling about every 7 months.

When will AI systems be able to carry out long projects independently?

In new research, we find a kind of “Moore’s Law for AI agents”: the length of tasks that AIs can do is doubling about every 7 months.
Toby Ord (@tobyordoxford) 's Twitter Profile Photo

Is there a half-life for the success rates of AI agents? I show that the success rates of AI agents on longer-duration tasks can be explained by an extremely simple mathematical model — a constant rate of failing during each minute a human would take to do the task. 🧵 1/

METR (@metr_evals) 's Twitter Profile Photo

We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.

We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers.

The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
david rein (@idavidrein) 's Twitter Profile Photo

I was pretty skeptical that this study was worth running, because I thought that *obviously* we would see significant speedup. x.com/METR_Evals/sta…

Chris Painter (@chrispainteryup) 's Twitter Profile Photo

METR a few months ago had two projects going in parallel: a project experimenting with AI researcher interviews to track degree of AI R&D acceleration/delegation, and this project. When the results started coming back from this project, we put the survey-only project on ice.