XLANG NLP Lab (@xlangnlp) 's Twitter Profile
XLANG NLP Lab

@xlangnlp

developing embodied AI agents that empower users to use language to interact with digital and physical environments to carry out real-world tasks.

ID: 1678044379121057792

linkhttps://xlang.ai calendar_today09-07-2023 14:12:50

103 Tweet

894 Followers

27 Following

Bowen Wang (@bowenwangnlp) 's Twitter Profile Photo

🎮 Computer Use Agent Arena is LIVE! 🚀 🔥 Easiest way to test computer-use agents in the wild without any setup 🌟 Compare top VLMs: OpenAI Operator, Claude 3.7, Gemini 2.5 Pro, Qwen 2.5 vl and more 🕹️ Test agents on 100+ real apps & webs with one-click config 🔒 Safe & free

Tianbao Xie (@tianbaox) 's Twitter Profile Photo

Finally we are here! 👏 Check out our most open & fair benchmark⚔️ for computer use capability evaluation for the community.

Tao Yu (@taoyds) 's Twitter Profile Photo

🚀After a year of development based on our OSWorld, Computer Use Agent Arena is LIVE! Test top AI agents (Operator, Claude 3.7...) on any kinds of computer use tasks with zero setup. Cloud-hosted, safe, and FREE! Try it now: arena.xlang.ai ! Data & code coming soon!

XLANG NLP Lab (@xlangnlp) 's Twitter Profile Photo

👉Compare and test Computer Use Agents (Operator, Claude 3.7...) on any kinds of tasks in real computers 🚩without any setup and cost🚩! Try our Computer Use Agent Arena: arena.xlang.ai

XLANG NLP Lab (@xlangnlp) 's Twitter Profile Photo

🚀 Exciting news! OpenAI's o3 & o4-mini, the most capable reasoning models, are now live on Computer Agent Arena! Test, vote, and explore their full potential with CUAs at arena.xlang.ai! Join the community and dive in!

🚀 Exciting news! <a href="/OpenAI/">OpenAI</a>'s o3 &amp; o4-mini, the most capable reasoning models, are now live on Computer Agent Arena!
Test, vote, and explore their full potential with CUAs at arena.xlang.ai! Join the community and dive in!
XLANG NLP Lab (@xlangnlp) 's Twitter Profile Photo

🎉 UI-TARS-1.5 is now live on Computer Agent Arena! Currently the SOTA model across multiple GUI benchmarks, showcasing leading performance in computer use, browser use, and even gameplay. Want to try the most intelligent CUA so far? Go to arena.xlang.ai.

🎉 UI-TARS-1.5 is now live on Computer Agent Arena!  

Currently the SOTA model across multiple GUI benchmarks, showcasing leading performance in computer use, browser use, and even gameplay.  

Want to try the most intelligent CUA so far? Go to arena.xlang.ai.
XLANG NLP Lab (@xlangnlp) 's Twitter Profile Photo

🏆 Leaderboard Update! 🚀 Claude 3.7 Sonnet from Anthropic ties #1 in Computer Agent Arena, followed by Operator from OpenAI & UI-TARS-1.5 from ByteDance, which is significantly different from prior benchmarks! Check the full rankings! 👉 arena.xlang.ai/leaderboard

🏆 Leaderboard Update!
🚀 Claude 3.7 Sonnet from <a href="/AnthropicAI/">Anthropic</a> ties #1 in Computer Agent Arena, followed by Operator from <a href="/OpenAI/">OpenAI</a> &amp; UI-TARS-1.5 from <a href="/BytedanceTalk/">ByteDance</a>, which is significantly different from prior benchmarks!

Check the full rankings! 👉 arena.xlang.ai/leaderboard
Bowen Wang (@bowenwangnlp) 's Twitter Profile Photo

😀Our initial leaderboard finally came out, here I'd like to share a few interesting findings based on our case study: 1, Claude 3.7 Sonnet consistently performs best across diverse task types, particularly excelling at open-ended queries like “write a paper reading report.” 2,