OSU NLP Group (@osunlp) 's Twitter Profile
OSU NLP Group

@osunlp

Natural Language Processing Group at The Ohio State University directed by @ysu_nlp @hhsun1 @shocheen

ID: 1420575193039204354

calendar_today29-07-2021 02:41:41

415 Tweet

1,1K Followers

138 Following

Huan Sun (OSU) (@hhsun1) 's Twitter Profile Photo

I will miss #NAACL2025 unfortunately, but please check out our work on chemistry agents, "ChemToolAgent: The Impact of Tools on Language Agents for Chemistry Problem Solving" today (May 1) during 2:00-3:30pm (local time) at Hall 3, Poster Session 5! Some updates: We have renamed

I will miss #NAACL2025 unfortunately, but please check out our work on chemistry agents, "ChemToolAgent: The Impact of Tools on Language Agents for Chemistry Problem Solving" today (May 1) during 2:00-3:30pm (local time) at Hall 3, Poster Session 5! Some updates: We have renamed
ComputerUseAgents Workshop (@workshopcua) 's Twitter Profile Photo

⏳ Less than 1 day left to submit! 🔦 Speaker Spotlight Time! We’re thrilled to welcome Yu Su (Yu Su), Distinguished Assistant Professor at The Ohio State University, as an invited speaker at the ICML 2025 Workshop on Computer Use Agents! His work bridges LLM agents, memory,

⏳ Less than 1 day left to submit!

🔦 Speaker Spotlight Time!
We’re thrilled to welcome Yu Su (<a href="/ysu_nlp/">Yu Su</a>), Distinguished Assistant Professor at The Ohio State University, as an invited speaker at the ICML 2025 Workshop on Computer Use Agents!

His work bridges LLM agents, memory,
Huan Sun (OSU) (@hhsun1) 's Twitter Profile Photo

Super excited to get funded by Schmidt Sciences to study computer-use agents (CUAs) under adversarial attacks. Many thanks to the student leads including Zeyi Liao, Jaylen Jones, Linxi Jiang, and amazing co-PIs Yu Su and Zhiqiang Lin. As the capabilities of CUAs improve,

Yu Su @#ICLR2025 (@ysu_nlp) 's Twitter Profile Photo

Glad to get the 'little stamp' on my appointment letter one year ahead of the clock 🥰 It came just in time amid the peak of the AI hype week. With a bit more job security, now it's time to think about the next chapter of my career. How can one continue to make meaningful

Glad to get the 'little stamp' on my appointment letter one year ahead of the clock 🥰

It came just in time amid the peak of the AI hype week. With a bit more job security, now it's time to think about the next chapter of my career. How can one continue to make meaningful
Vardaan Pahuja (@vardaanpahuja) 's Twitter Profile Photo

🚀 Thrilled to unveil the most exciting project of my PhD: Explorer — Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents TL;DR: A scalable multi-agent pipeline that leverages exploration for diverse web agent trajectory synthesis. 📄 Paper:

🚀 Thrilled to unveil the most exciting project of my PhD:
Explorer — Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents
TL;DR: A scalable multi-agent pipeline that leverages exploration for diverse web agent trajectory synthesis.

📄 Paper:
Zeyi Liao (@liaozeyi) 's Twitter Profile Photo

⁉️Can you really trust Computer-Use Agents (CUAs) to control your computer⁉️ Not yet, Anthropic Opus 4 shows an alarming 48% Attack Success Rate against realistic internet injection❗️ Introducing RedTeamCUA: realistic, interactive, and controlled sandbox environments for

Huan Sun (OSU) (@hhsun1) 's Twitter Profile Photo

Realistic adversarial testing of Computer-Use Agents (CUAs) to identify their vulnerabilities and make them safer and more secure is … hard. Is Anthropic Claude 4 Opus more robust to indirect prompt injection than previous versions like Claude 3.7? Not really. Why hard?

Realistic adversarial testing of Computer-Use Agents (CUAs) to identify their vulnerabilities and make them safer and more secure is … hard. Is <a href="/AnthropicAI/">Anthropic</a> Claude 4 Opus more robust to indirect prompt injection than previous versions like Claude 3.7? Not really.

Why hard?
Botao Yu (@botaoyu24) 's Twitter Profile Photo

🔬 Introducing ChemMCP, the first MCP-compatible toolkit for empowering AI models with advanced chemistry capabilities! In recent years, we’ve seen rising interest in tool-using AI agents across domains. Particularly in scientific domains like chemistry, LLMs alone still fall

Chan Hee (Luke) Song (@luke_ch_song) 's Twitter Profile Photo

Heading to #CVPR2025 to present our Oral paper with NVIDIA Robotics! 📅 June 14 (Sat) | 🕐 1:00 PM | 📍Oral Session 4B @ ExHall A2 I’ll also be at the 3D-VLA/VLM and EVAL-FoMo 2 workshops presenting the same work. Come say hi!

Yu Su @#ICLR2025 (@ysu_nlp) 's Twitter Profile Photo

📈 Scaling may be hitting a wall in the digital world, but it's only beginning in the biological world! We trained a foundation model on 214M images of ~1M species (50% of named species on Earth 🐨🐠🌻🦠) and found emergent properties capturing hidden regularities in nature. 🧵

📈 Scaling may be hitting a wall in the digital world, but it's only beginning in the biological world!

We trained a foundation model on 214M images of ~1M species (50% of named species on Earth 🐨🐠🌻🦠) and found emergent properties capturing hidden regularities in nature.

🧵
Jianyang Gu (@vimar_gu) 's Twitter Profile Photo

It’s so exciting to see BioCLIP 2 demonstrates a biologically meaningful embedding space while only trained to distinguish species. Can’t wait to see more applications of BioCLIP 2 in solving real world problems. I’m attending #CVPR2025 in Nashville. Happy to chat about it!

Huan Sun (OSU) (@hhsun1) 's Twitter Profile Photo

Quizzing BioClip about an animal/plant has been another fun activity we do at a zoo/garden. Most of the time, it does get things right! Now check out BioClip 2, with much stronger performance and nice properties!

Yifei Li (@yifeilipku) 's Twitter Profile Photo

📢 Introducing AutoSDT, a fully automatic pipeline that collects data-driven scientific coding tasks at scale! We use AutoSDT to collect AutoSDT-5K, enabling open co-scientist models that rival GPT-4o on ScienceAgentBench! Thread below ⬇️ (1/n)

📢 Introducing AutoSDT, a fully automatic pipeline that collects data-driven scientific coding tasks at scale!
We use AutoSDT to collect AutoSDT-5K, enabling open co-scientist models that rival GPT-4o on ScienceAgentBench!
Thread below ⬇️ (1/n)
Huan Sun (OSU) (@hhsun1) 's Twitter Profile Photo

If you care about building AI co-scientists for data-driven discovery, check out our recent work on automatically collecting large-scale, authentic, high-quality scientific coding tasks at a low cost, led by Yifei Li Hanane Moussa OSU NLP Group. 🌟AutoSDT: Scaling Data-Driven

XLANG NLP Lab (@xlangnlp) 's Twitter Profile Photo

🔥New Computer Agent Arena Leaderboard Updates (2k+ user votes)! 🤔Which VLMs act better as computer use agents (CUAs)? 1, Claude Sonnet 4 🥇 2, Claude 3.7 Sonnet 🥈 3, UI-TARS-1.5 🥉 4, Operator More insights in the thread 👇 arena.xlang.ai

🔥New Computer Agent Arena Leaderboard Updates (2k+ user votes)!
🤔Which VLMs act better as computer use agents (CUAs)?

1, Claude Sonnet 4 🥇
2, Claude 3.7 Sonnet 🥈
3, UI-TARS-1.5 🥉
4, Operator

More insights in the thread 👇
arena.xlang.ai
OSU NLP Group (@osunlp) 's Twitter Profile Photo

Our group is known for producing widely adopted benchmarks (MMMU, Mind2Web, TravelPlaner, ScienceAgentBench etc.). Mind2Web 2 is probably the benchmark we spent the most time on ever. 26 authors spent over 6 months to tackle the emerging evaluation crisis head-on. Check it out!

Huan Sun (OSU) (@hhsun1) 's Twitter Profile Photo

Rigorously evaluating agentic systems has been one of our pursuits at OSU NLP Group, with prior efforts including Mind2Web and ScienceAgentBench. Today we introduce Mind2Web 2 to evaluate the emerging Deep Research-like agents: It features realistic and diverse long-horizon web

Rigorously evaluating agentic systems has been one of our pursuits at <a href="/osunlp/">OSU NLP Group</a>, with prior efforts including Mind2Web and ScienceAgentBench. 

Today we introduce Mind2Web 2 to evaluate the emerging Deep Research-like agents: It features realistic and diverse long-horizon web
Tianci Xue (@xue_tianci) 's Twitter Profile Photo

Thrilled to announce that our work Online-Mind2Web has been accepted to Conference on Language Modeling ! 🎉 It's my first PhD work and first paper at COLM. See you in Montreal! 🍁 Several teams are already testing their agents on Online-Mind2Web. If you're curious about how your agent performs, try