Graham Neubig (@gneubig) 's Twitter Profile
Graham Neubig

@gneubig

Associate professor at CMU, studying natural language processing and machine learning. Co-founder @allhands_ai

ID: 185910194

linkhttp://www.phontron.com calendar_today02-09-2010 03:25:38

3,3K Tweet

37,37K Followers

668 Following

All Hands AI (@allhands_ai) 's Twitter Profile Photo

The SWE-Bench verified leaderboard has been updated and OpenHands is both number one overall, and the only open source agent in the top 10! swebench.com Read more about our approach of the OpenHands critic here: all-hands.dev/blog/sota-on-s…

The SWE-Bench verified leaderboard has been updated and OpenHands is both number one overall, and the only open source agent in the top 10! swebench.com

Read more about our approach of the OpenHands critic here: all-hands.dev/blog/sota-on-s…
Graham Neubig (@gneubig) 's Twitter Profile Photo

I'm going to PyCon US tomorrow! First time ever since hearing about it many years ago from my labmate David Cournapeau, author of scikit-learn. Looking forward to learning about all the latest developments in the Python ecosystem outside my bubble. Say hi if you're there 😃

All Hands AI (@allhands_ai) 's Twitter Profile Photo

We have a bunch of exciting features in OpenHands 0.38.0! - Native Windows support (no WSL) - Browser screenshots - More customizability of sandboxes We also released a Chrome extension allowing for one-click starts of OpenHands from github!

We have a bunch of exciting features in OpenHands 0.38.0!

- Native Windows support (no WSL)
- Browser screenshots
- More customizability of sandboxes

We also released a Chrome extension allowing for one-click starts of OpenHands from github!
Patrick Fernandes (@psanfernandes) 's Twitter Profile Photo

MT metrics excel at evaluating sentence translations, but struggle with complex texts We introduce *TREQA* a framework to assess how translations preserve key info by using LLMs to generate & answer questions about them arxiv.org/abs/2504.07583 (co-lead Sweta Agrawal) 1/15

MT metrics excel at evaluating sentence translations, but struggle with complex texts

We  introduce *TREQA* a framework to assess how translations preserve key info by using LLMs to generate & answer questions about them

arxiv.org/abs/2504.07583

(co-lead <a href="/swetaagrawal20/">Sweta Agrawal</a>)

1/15
Graham Neubig (@gneubig) 's Twitter Profile Photo

Congrats to OpenAI on the codex release, it's an exciting time for coding agents! We some quick thoughts on the UX, accuracy on SWE-bench (it's pretty good!), and the focus on safety/security the thread below. We'll continue adding notes to the thread as we take a closer look.

Graham Neubig (@gneubig) 's Twitter Profile Photo

Pretty great results immediately with Claude 4, and we can push it further (we haven't even prompt engineered yet). Gotta say that Anthropic did a good job on this one.