OSU NLP Group (@osunlp) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

I will miss #NAACL2025 unfortunately, but please check out our work on chemistry agents, "ChemToolAgent: The Impact of Tools on Language Agents for Chemistry Problem Solving" today (May 1) during 2:00-3:30pm (local time) at Hall 3, Poster Session 5! Some updates: We have renamed

thumb_up_off_alt42

chat_bubble_outline1

repeat16

shareShare

ComputerUseAgents Workshop

@workshopcua

2 months ago

⏳ Less than 1 day left to submit! 🔦 Speaker Spotlight Time! We’re thrilled to welcome Yu Su (Yu Su), Distinguished Assistant Professor at The Ohio State University, as an invited speaker at the ICML 2025 Workshop on Computer Use Agents! His work bridges LLM agents, memory,

⏳ Less than 1 day left to submit!

🔦 Speaker Spotlight Time!
We’re thrilled to welcome Yu Su (<a href="/ysu_nlp/">Yu Su</a>), Distinguished Assistant Professor at The Ohio State University, as an invited speaker at the ICML 2025 Workshop on Computer Use Agents!

His work bridges LLM agents, memory,

thumb_up_off_alt26

chat_bubble_outline1

repeat9

shareShare

Huan Sun (OSU)

@hhsun1

2 months ago

Super excited to get funded by Schmidt Sciences to study computer-use agents (CUAs) under adversarial attacks. Many thanks to the student leads including Zeyi Liao, Jaylen Jones, Linxi Jiang, and amazing co-PIs Yu Su and Zhiqiang Lin. As the capabilities of CUAs improve,

thumb_up_off_alt94

chat_bubble_outline3

repeat9

shareShare

Yu Su @#ICLR2025

@ysu_nlp

2 months ago

Glad to get the 'little stamp' on my appointment letter one year ahead of the clock 🥰 It came just in time amid the peak of the AI hype week. With a bit more job security, now it's time to think about the next chapter of my career. How can one continue to make meaningful

thumb_up_off_alt231

chat_bubble_outline33

repeat12

shareShare

Vardaan Pahuja

@vardaanpahuja

2 months ago

🚀 Thrilled to unveil the most exciting project of my PhD: Explorer — Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents TL;DR: A scalable multi-agent pipeline that leverages exploration for diverse web agent trajectory synthesis. 📄 Paper:

thumb_up_off_alt53

chat_bubble_outline5

repeat23

shareShare

Zeyi Liao

@liaozeyi

2 months ago

⁉️Can you really trust Computer-Use Agents (CUAs) to control your computer⁉️ Not yet, Anthropic Opus 4 shows an alarming 48% Attack Success Rate against realistic internet injection❗️ Introducing RedTeamCUA: realistic, interactive, and controlled sandbox environments for

thumb_up_off_alt70

chat_bubble_outline1

repeat30

shareShare

Huan Sun (OSU)

@hhsun1

2 months ago

Realistic adversarial testing of Computer-Use Agents (CUAs) to identify their vulnerabilities and make them safer and more secure is … hard. Is Anthropic Claude 4 Opus more robust to indirect prompt injection than previous versions like Claude 3.7? Not really. Why hard?

Realistic adversarial testing of Computer-Use Agents (CUAs) to identify their vulnerabilities and make them safer and more secure is … hard. Is <a href="/AnthropicAI/">Anthropic</a> Claude 4 Opus more robust to indirect prompt injection than previous versions like Claude 3.7? Not really.

Why hard?

thumb_up_off_alt57

chat_bubble_outline3

repeat24

shareShare

Botao Yu

@botaoyu24

2 months ago

🔬 Introducing ChemMCP, the first MCP-compatible toolkit for empowering AI models with advanced chemistry capabilities! In recent years, we’ve seen rising interest in tool-using AI agents across domains. Particularly in scientific domains like chemistry, LLMs alone still fall

thumb_up_off_alt66

chat_bubble_outline3

repeat30

shareShare

Chan Hee (Luke) Song

@luke_ch_song

2 months ago

Heading to #CVPR2025 to present our Oral paper with NVIDIA Robotics! 📅 June 14 (Sat) | 🕐 1:00 PM | 📍Oral Session 4B @ ExHall A2 I’ll also be at the 3D-VLA/VLM and EVAL-FoMo 2 workshops presenting the same work. Come say hi!

thumb_up_off_alt29

chat_bubble_outline3

repeat5

shareShare

Yu Su @#ICLR2025

@ysu_nlp

a month ago

📈 Scaling may be hitting a wall in the digital world, but it's only beginning in the biological world! We trained a foundation model on 214M images of ~1M species (50% of named species on Earth 🐨🐠🌻🦠) and found emergent properties capturing hidden regularities in nature. 🧵

thumb_up_off_alt268

chat_bubble_outline5

repeat57

shareShare

Jianyang Gu

@vimar_gu

a month ago

It’s so exciting to see BioCLIP 2 demonstrates a biologically meaningful embedding space while only trained to distinguish species. Can’t wait to see more applications of BioCLIP 2 in solving real world problems. I’m attending #CVPR2025 in Nashville. Happy to chat about it!

thumb_up_off_alt12

chat_bubble_outline0

repeat6

shareShare

Huan Sun (OSU)

@hhsun1

a month ago

Quizzing BioClip about an animal/plant has been another fun activity we do at a zoo/garden. Most of the time, it does get things right! Now check out BioClip 2, with much stronger performance and nice properties!

thumb_up_off_alt11

chat_bubble_outline0

repeat2

shareShare

Yifei Li

@yifeilipku

a month ago

📢 Introducing AutoSDT, a fully automatic pipeline that collects data-driven scientific coding tasks at scale! We use AutoSDT to collect AutoSDT-5K, enabling open co-scientist models that rival GPT-4o on ScienceAgentBench! Thread below ⬇️ (1/n)

thumb_up_off_alt72

chat_bubble_outline4

repeat25

shareShare

OSU NLP Group

@osunlp

a month ago

Proud of Chan Hee (Luke) Song ‘s oral presentation at #CVPR2025!

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Huan Sun (OSU)

@hhsun1

a month ago

If you care about building AI co-scientists for data-driven discovery, check out our recent work on automatically collecting large-scale, authentic, high-quality scientific coding tasks at a low cost, led by Yifei Li Hanane Moussa OSU NLP Group. 🌟AutoSDT: Scaling Data-Driven

thumb_up_off_alt14

chat_bubble_outline0

repeat3

shareShare

XLANG NLP Lab

@xlangnlp

a month ago

🔥New Computer Agent Arena Leaderboard Updates (2k+ user votes)! 🤔Which VLMs act better as computer use agents (CUAs)? 1, Claude Sonnet 4 🥇 2, Claude 3.7 Sonnet 🥈 3, UI-TARS-1.5 🥉 4, Operator More insights in the thread 👇 arena.xlang.ai

thumb_up_off_alt38

chat_bubble_outline1

repeat18

shareShare

OSU NLP Group

@osunlp

a month ago

Our group is known for producing widely adopted benchmarks (MMMU, Mind2Web, TravelPlaner, ScienceAgentBench etc.). Mind2Web 2 is probably the benchmark we spent the most time on ever. 26 authors spent over 6 months to tackle the emerging evaluation crisis head-on. Check it out!

thumb_up_off_alt24

chat_bubble_outline0

repeat5

shareShare

Huan Sun (OSU)

@hhsun1

a month ago

Rigorously evaluating agentic systems has been one of our pursuits at OSU NLP Group, with prior efforts including Mind2Web and ScienceAgentBench. Today we introduce Mind2Web 2 to evaluate the emerging Deep Research-like agents: It features realistic and diverse long-horizon web

Rigorously evaluating agentic systems has been one of our pursuits at <a href="/osunlp/">OSU NLP Group</a>, with prior efforts including Mind2Web and ScienceAgentBench.

Today we introduce Mind2Web 2 to evaluate the emerging Deep Research-like agents: It features realistic and diverse long-horizon web

thumb_up_off_alt38

chat_bubble_outline0

repeat6

shareShare

Tianci Xue

@xue_tianci

18 days ago

Thrilled to announce that our work Online-Mind2Web has been accepted to Conference on Language Modeling ! 🎉 It's my first PhD work and first paper at COLM. See you in Montreal! 🍁 Several teams are already testing their agents on Online-Mind2Web. If you're curious about how your agent performs, try

thumb_up_off_alt31

chat_bubble_outline1

repeat6

shareShare