
Kilian Lieret @ICLR
@klieret
Research Software Engineer at Princeton University. AI agents & benchmarks for software engineering.
ID: 1388792248100442112
https://github.com/klieret 02-05-2021 09:47:47
50 Tweet
460 Followers
35 Following









Can GPT, Claude, and Gemini play video games like Zelda, Civ, and Doom II? š©š¶š±š²š¼šš®šŗš²šš²š»š°šµ evaluates VLMs on Game Boy & MS-DOS games given only raw screen input, just like how a human would play. The best model (Gemini) completes just 0.48% of the benchmark! š§µš