Joel Becker (@joel_bkr) 's Twitter Profile
Joel Becker

@joel_bkr

move fast and fix things @METR_evals. 'soccer'-me @MessiSeconds.

ID: 103418485

linkhttps://joel-becker.com/ calendar_today09-01-2010 23:46:25

1,1K Tweet

2,2K Followers

1,1K Following

Sydney (@sydneyvonarx) 's Twitter Profile Photo

The terms “CoT” and reasoning trace make it sound like the CoT is a summary of an LLM’s reasoning. But IMO it’s more accurate to view CoT as a tool models use to think better. CoT monitoring is about tracking how models use this tool so we can glean insight into their

Brian Jabarian (@brian_jabarian) 's Twitter Profile Photo

1/ In my job market paper (with Luca Henkel) in partnership with PSG Global Solutions, TP subsidiary, we tested AI-led interviews with 70,000 applicants.

1/ In my job market paper (with <a href="/Henkel_JLuca/">Luca Henkel</a>) in partnership with PSG Global Solutions, <a href="/Teleperformance/">TP</a> subsidiary, we tested AI-led interviews with 70,000 applicants.
Joel Becker (@joel_bkr) 's Twitter Profile Photo

we're going to research the impact of AI on totally greenfield software engineering. come be in our anonymized data/have fun/maybe win $15k, in SF, september 6! x.com/FactoryAI/stat…

Joel Becker (@joel_bkr) 's Twitter Profile Photo

thrilled to be on Patrick McKenzie's complex systems podcast! we talk about perception vs reality in the impact of AI on dev productivity, the industrial organization of software engineering, and much more complexsystemspodcast.com/episodes/the-g…

Charles Foster (@cfgeek) 's Twitter Profile Photo

> 🤖 Half of participants will build with AI tools > 👩‍💻 Half of participants will build without AI tools > Judging is blind and scored on speed, precision, and quality.

&gt; 🤖 Half of participants will build with AI tools
&gt; 👩‍💻 Half of participants will build without AI tools
&gt; Judging is blind and scored on speed, precision, and quality.
Patrick McKenzie (@patio11) 's Twitter Profile Photo

This week on Complex Systems I was thrilled to welcome METR's Joel Becker to chat about how we rigorously measure progress of LLMs, an interesting research result Joel and the team hit recently, and a tiny bit about the industrial organization of software engineering.