Graham Neubig (@gneubig) Twitter Tweets • TwiCopy

Graham Neubig

@gneubig

+ Follow

Associate professor at CMU, studying natural language processing and machine learning. Co-founder @allhands_ai

ID: 185910194

linkhttp://www.phontron.com calendar_today02-09-2010 03:25:38

3,3K Tweet

37,37K Followers

668 Following

All Hands AI

@allhands_ai

6 months ago

The SWE-Bench verified leaderboard has been updated and OpenHands is both number one overall, and the only open source agent in the top 10! swebench.com Read more about our approach of the OpenHands critic here: all-hands.dev/blog/sota-on-s…

thumb_up_off_alt138

chat_bubble_outline3

repeat25

shareShare

Graham Neubig

@gneubig

6 months ago

I'm going to PyCon US tomorrow! First time ever since hearing about it many years ago from my labmate David Cournapeau, author of scikit-learn. Looking forward to learning about all the latest developments in the Python ecosystem outside my bubble. Say hi if you're there 😃

thumb_up_off_alt33

chat_bubble_outline1

repeat0

shareShare

All Hands AI

@allhands_ai

6 months ago

We have a bunch of exciting features in OpenHands 0.38.0! - Native Windows support (no WSL) - Browser screenshots - More customizability of sandboxes We also released a Chrome extension allowing for one-click starts of OpenHands from github!

thumb_up_off_alt29

chat_bubble_outline1

repeat3

shareShare

Patrick Fernandes

@psanfernandes

6 months ago

MT metrics excel at evaluating sentence translations, but struggle with complex texts We introduce *TREQA* a framework to assess how translations preserve key info by using LLMs to generate & answer questions about them arxiv.org/abs/2504.07583 (co-lead Sweta Agrawal) 1/15

thumb_up_off_alt34

chat_bubble_outline2

repeat11

shareShare

Graham Neubig

@gneubig

6 months ago

Congrats to OpenAI on the codex release, it's an exciting time for coding agents! We some quick thoughts on the UX, accuracy on SWE-bench (it's pretty good!), and the focus on safety/security the thread below. We'll continue adding notes to the thread as we take a closer look.

thumb_up_off_alt49

chat_bubble_outline3

repeat3

shareShare

Graham Neubig

@gneubig

6 months ago

Thanks to all the new users, and special thanks to the infra folks at All Hands AI who are keeping things afloat!

thumb_up_off_alt19

chat_bubble_outline2

repeat0

shareShare

Mistral AI

@mistralai

6 months ago

Meet Devstral, our SOTA open model designed specifically for coding agents and developed with All Hands AI mistral.ai/news/devstral

Meet Devstral, our SOTA open model designed specifically for coding agents and developed with <a href="/allhands_ai/">All Hands AI</a>

mistral.ai/news/devstral

thumb_up_off_alt3,3K

chat_bubble_outline102

repeat431

shareShare

All Hands AI

@allhands_ai

6 months ago

Devstral is the number one model on Hugging Face today 🎉 Thanks everyone for the support! huggingface.co/models

Devstral is the number one model on <a href="/huggingface/">Hugging Face</a> today 🎉 Thanks everyone for the support!

huggingface.co/models

thumb_up_off_alt107

chat_bubble_outline4

repeat14

shareShare

Graham Neubig

@gneubig

6 months ago

Pretty great results immediately with Claude 4, and we can push it further (we haven't even prompt engineered yet). Gotta say that Anthropic did a good job on this one.

thumb_up_off_alt81

chat_bubble_outline1

repeat4

shareShare