Arjun Guha (@arjunguha) 's Twitter Profile
Arjun Guha

@arjunguha

hacker / CS professor @KhouryCollege @neu_prl

ID: 305253652

linkhttps://ccs.neu.edu/~arjunguha/ calendar_today25-05-2011 22:15:01

777 Tweet

1,1K Followers

197 Following

Leandro von Werra (@lvwerra) 's Twitter Profile Photo

Introducing DABStep: Data Agent Benchmark for multi-step reasoning. We teamed up with Adyen to test if current LLMs can solve *hard* and *real-world* data analysis tasks. TL;DR: No! They often fail to read the manual or debug errors. The best model only gets 16% right!

Introducing DABStep: Data Agent Benchmark for multi-step reasoning.  

We teamed up with <a href="/Adyen/">Adyen</a> to test if current LLMs can solve *hard* and *real-world* data analysis tasks.  

TL;DR: No! They often fail to read the manual or debug errors. The best model only gets 16% right!
Arjun Guha (@arjunguha) 's Twitter Profile Photo

There is a fundamental misunderstanding here. PhD students do not complete assigned tasks. arstechnica.com/ai/2025/03/wha…

David Bau (@davidbau) 's Twitter Profile Photo

Why is interpretability the key to dominance in AI? Not winning the scaling race, or banning China. Our answer to OSTP/NSF, w/ Goodfire's Tom McGrath Transluce's Sarah Schwettmann MIT's Dylan HadfieldMenell resilience.baulab.info/docs/AI_Action… Here's why:🧵 ↘️

Why is interpretability the key to dominance in AI?

Not winning the scaling race, or banning China.

Our answer to OSTP/NSF, w/ Goodfire's <a href="/banburismus_/">Tom McGrath</a> Transluce's <a href="/cogconfluence/">Sarah Schwettmann</a> MIT's <a href="/dhadfieldmenell/">Dylan HadfieldMenell</a>
resilience.baulab.info/docs/AI_Action…

Here's why:🧵 ↘️
Arjun Guha (@arjunguha) 's Twitter Profile Photo

This is a short note on my experience as an immigrant and new American. The timeline is this: - 2002: moved from India to attend Grinnell College, Iowa - 2006: started PhD in computer science at Brown University, Rhode Island - 2012: started as a postdoc scholar at Cornell

Arjun Guha (@arjunguha) 's Twitter Profile Photo

I hire fewer TAs because I can rapidly complete tasks with GenAI that would require long back-and-forth with a TA. I have to validate LLM output, but 1) I read fast and 2) I have to validate junior TA output too. Negative effect is obvious: human TA training is good for

Arjun Guha (@arjunguha) 's Twitter Profile Photo

I’m considering returning to take-home, open-book exams. Also open Google and open ChatGPT. To make this work, the exam will have questions that are beyond the scope of the class. This is akin to a test of scalable oversight. Has anyone tried this?

Arjun Guha (@arjunguha) 's Twitter Profile Photo

The recent *Your Brain on ChatGPT* paper is cool, from the little that I understand of it. To this day, when an undergraduate approaches me to do research, I tell them to read a prefix of the PLAI book (1st ed.), code it up, and then demonstrate to me that they understand it. I

Arjun Guha (@arjunguha) 's Twitter Profile Photo

When a model reports a single score on MultiPL-E, which languages are being considered for the average? I don't think it's all 18, or the 22-25 now supported. Is it the seven languages that Code Llama decided to measure?

Arjun Guha (@arjunguha) 's Twitter Profile Photo

These kinds of statements are true for a tiny number of people. If you can figure out how to do research by yourself, that’s amazing. I needed a lot of training, and most people do. It’s great that we have a culture that both does not care about credentials, and also lets people

Khoury College of Computer Sciences (@khourycollege) 's Twitter Profile Photo

After building and burnishing their research chops at Khoury College, nine PhD graduates and former postdoctoral fellows are beginning their careers as professors this year. To hear more about their stories: bit.ly/4eBLedy

After building and burnishing their research chops at Khoury College, nine PhD graduates and former postdoctoral fellows are beginning their careers as professors this year.

To hear more about their stories: bit.ly/4eBLedy
Arjun Guha (@arjunguha) 's Twitter Profile Photo

Flashback to useless content from introductory Java: I just wrote my own buffered line reader, with several non-standard bells and whistles. This is in service of PL and machine learning.