Saaket Agashe @ NAACL 2025 (@saa1605) 's Twitter Profile
Saaket Agashe @ NAACL 2025

@saa1605

CSE PhD Student @ UC Santa Cruz.

ID: 379252208

calendar_today24-09-2011 16:43:23

20 Tweet

132 Followers

75 Following

Yue Fan (@yfan_ucsc) 's Twitter Profile Photo

🚀🚀🚀 Excited to share our latest breakthrough: Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding! 📍 Click ANYWHERE on the screen, and our Tree-of-Lens (ToL) agent will tell you what's there and where it's located. 🌟 As shown in the video,

Xin Eric Wang @ ICLR 2025 (@xwang_lk) 's Twitter Profile Photo

🚀 Exciting news! Agent S will appear at #ICLR2025 in Singapore! 🌏 After 3 months post-release, it remains the SOTA open-source OS agent, now supporting Mac, Linux, Windows, and web browsers (integrated into our Simular Browser: simular.ai)! 🌐✨ Get started in

Xin Eric Wang @ ICLR 2025 (@xwang_lk) 's Twitter Profile Photo

Congrats to Saaket Agashe @ NAACL 2025 for his LLM-Coordination paper being accepted to #NAACL2025 Findings! What a day for a new PhD student who just completed his first year! Two first-author papers being accepted at ICLR and NAACL, with hundreds of GitHub stars & citations already!

Yue Fan (@yfan_ucsc) 's Twitter Profile Photo

Tired of GUI grounding models failing in new apps? 🤔 We introduce GUI-Bee 🐝with RL-driven exploration (covering 51% more unique scenes compared to baselines) to help your GUI action grounding models conquer NOVEL environments!🚀 Key Highlights: ✅ We pioneer aligning GUI

Tired of GUI grounding models failing in new apps? 🤔 We introduce GUI-Bee 🐝with RL-driven exploration (covering 51% more unique scenes compared to baselines) to help your GUI action grounding models conquer NOVEL environments!🚀

Key Highlights:
✅ We pioneer aligning GUI
Kaiwen Zhou (@kaiwenzhou9) 's Twitter Profile Photo

🛡️ R1 Safety Paper Alert! 📰 How safe are large reasoning models like R1? What is their safety behavior? Does their enhanced capability introduce greater risks? — We present a comprehensive safety analysis on large reasoning models: 🔥 Key Findings: 1️⃣Open-source R1 models lag

🛡️ R1 Safety Paper Alert! 📰

How safe are large reasoning models like R1? What is their safety behavior? Does their enhanced capability introduce greater risks? — We present a comprehensive safety analysis on large reasoning models:

🔥 Key Findings:
1️⃣Open-source R1 models lag
Qianqi "Jackie" Yan (@qianqi_yan) 's Twitter Profile Photo

New Paper Alert: Multimodal Inconsistency Reasoning (MMIR)! ✨ Ever visited a webpage where the text says “IKEA desk” yet images and descriptions elsewhere show a totally different brand? Or read a slide that shows “50% growth” in the text but the accompanying chart looks flat?

New Paper Alert: Multimodal Inconsistency Reasoning (MMIR)! ✨

Ever visited a webpage where the text says “IKEA desk” yet images and descriptions elsewhere show a totally different brand? Or read a slide that shows “50% growth” in the text but the accompanying chart looks flat?
Saaket Agashe @ NAACL 2025 (@saa1605) 's Twitter Profile Photo

📢 Excited to present our poster at #ICLR2025! Agent S: An Open Agentic Framework that Uses Computers Like a Human. Come explore how Agent S leverages Experience Augmented Planning to interact with computers like humans do! 📍Hall 3 + Hall 2B, Poster #408 🗓️ April 26th, 10 AM

📢 Excited to present our poster at #ICLR2025!

Agent S: An Open Agentic Framework that Uses Computers Like a Human.

Come explore how Agent S leverages Experience Augmented Planning to interact with computers like humans do! 

📍Hall 3 + Hall 2B, Poster #408
🗓️ April 26th, 10 AM
Xin Eric Wang @ ICLR 2025 (@xwang_lk) 's Twitter Profile Photo

Our Agent S paper won the Best Paper Award at #ICLR2025 Agentic AI for Science Workshop! 🎉 Congrats to Simular Research team (Saaket Agashe, Jiuzhou Han, Ang Li). This is the most hands-on and committed project I’ve led since I started my faculty career. We’re just getting

Our Agent S paper won the Best Paper Award at #ICLR2025 Agentic AI for Science Workshop! 🎉 Congrats to Simular Research team (<a href="/saa1605/">Saaket Agashe</a>, <a href="/jiuzhou_han/">Jiuzhou Han</a>, <a href="/angli_ai/">Ang Li</a>). This is the most hands-on and committed project I’ve led since I started my faculty career. We’re just getting
Saaket Agashe @ NAACL 2025 (@saa1605) 's Twitter Profile Photo

I’ll be presenting our poster: “LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models” tomorrow at #NAACL2025! ⏰ 11 AM 📍 Hall 3 Drop by to chat about applications of LLMs for Multi-Agent Coordination! #MultiAgentAI #LLMs

I’ll be presenting our poster:
“LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models”
tomorrow at #NAACL2025!

⏰ 11 AM
📍 Hall 3

Drop by to chat about applications of LLMs for Multi-Agent Coordination! #MultiAgentAI #LLMs
Qianqi "Jackie" Yan (@qianqi_yan) 's Twitter Profile Photo

🚀 New paper out! “Hidden in Plain Sight: Probing Implicit Reasoning in Multimodal Language Models” Real life is messy: 🔹 “Make a cup of apple juice” - but no apples are in sight 🔹 “Say hi to my friend” - yet two people are in the frame 🔹 “Tell me the brand of the lipstick” -

🚀 New paper out! “Hidden in Plain Sight: Probing Implicit Reasoning in Multimodal Language Models”

Real life is messy:
🔹 “Make a cup of apple juice” - but no apples are in sight
🔹 “Say hi to my friend” - yet two people are in the frame
🔹 “Tell me the brand of the lipstick” -
Qianqi "Jackie" Yan (@qianqi_yan) 's Twitter Profile Photo

We’re thrilled to launch the MMIR Challenge at the #ICCV2025 CLVL Workshop! 🧠 🖼️ Task: Detect inconsistencies in multimodal artifacts (webpages, slides, posters) 🏆 Top submissions invited to present in the non-archival track at CLVL 🔗 Compete now → kaggle.com/competitions/m…

Kabir (@kabirahuja004) 's Twitter Profile Photo

How does GPT-5 do on FlawedFictions? 🍩 On short stories, it reaches SoTA with CE-Eval = 0.70 (max 1), even above est. human performance. On long stories (FlawedFictionsLong), it still struggles at 0.47. We’ll present FlawedFictions at Conference on Language Modeling (Poster Session 2 Tuesday).

How does GPT-5 do on FlawedFictions? 🍩

On short stories, it reaches SoTA with CE-Eval = 0.70 (max 1), even above est. human performance. On long stories (FlawedFictionsLong), it still struggles at 0.47.

We’ll present FlawedFictions at <a href="/COLM_conf/">Conference on Language Modeling</a> (Poster Session 2 Tuesday).
Xin Eric Wang @ ICLR 2025 (@xwang_lk) 's Twitter Profile Photo

🚀 Introducing 𝐀𝐠𝐞𝐧𝐭 𝐒3, the most advanced computer-use agent, now 𝐚𝐩𝐩𝐫𝐨𝐚𝐜𝐡𝐢𝐧𝐠 𝐡𝐮𝐦𝐚𝐧-𝐥𝐞𝐯𝐞𝐥 𝐩𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞🧠💻 Just one year ago, Agent S scored ~20% on OSWorld: SOTA then, but far from human 72%. Today, Agent S3 reaches 6̳9̳.̳9̳%̳ (⬆10% over

Simular (@simularai) 's Twitter Profile Photo

🚀 Simular at COLM 2025 — Presenting Agent S2 in Montréal! 🇨🇦 We’re excited to share that our research team — Xin Eric Wang, Vincent and Kyle — presented Agent S2: A Compositional Generalist–Specialist Framework for Computer Use Agents at COLM 2025 in Montréal! Agent S2

🚀 <a href="/SimularAI/">Simular</a> at COLM 2025 — Presenting Agent S2 in Montréal! 🇨🇦

We’re excited to share that our research team — <a href="/xwang_lk/">Xin Eric Wang</a>, Vincent and Kyle — presented Agent S2: A Compositional Generalist–Specialist Framework for Computer Use Agents at COLM 2025 in Montréal!

Agent S2