Botao Yu (@botaoyu24) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

🚀 Thrilled to unveil the most exciting project of my PhD: Explorer — Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents TL;DR: A scalable multi-agent pipeline that leverages exploration for diverse web agent trajectory synthesis. 📄 Paper:

thumb_up_off_alt53

chat_bubble_outline5

repeat23

shareShare

Zeyi Liao

@liaozeyi

2 months ago

⁉️Can you really trust Computer-Use Agents (CUAs) to control your computer⁉️ Not yet, Anthropic Opus 4 shows an alarming 48% Attack Success Rate against realistic internet injection❗️ Introducing RedTeamCUA: realistic, interactive, and controlled sandbox environments for

thumb_up_off_alt70

chat_bubble_outline1

repeat30

shareShare

Huan Sun (OSU)

@hhsun1

2 months ago

Realistic adversarial testing of Computer-Use Agents (CUAs) to identify their vulnerabilities and make them safer and more secure is … hard. Is Anthropic Claude 4 Opus more robust to indirect prompt injection than previous versions like Claude 3.7? Not really. Why hard?

Realistic adversarial testing of Computer-Use Agents (CUAs) to identify their vulnerabilities and make them safer and more secure is … hard. Is <a href="/AnthropicAI/">Anthropic</a> Claude 4 Opus more robust to indirect prompt injection than previous versions like Claude 3.7? Not really.

Why hard?

thumb_up_off_alt57

chat_bubble_outline3

repeat24

shareShare

Yu Su @#ICLR2025

@ysu_nlp

2 months ago

📈 Scaling may be hitting a wall in the digital world, but it's only beginning in the biological world! We trained a foundation model on 214M images of ~1M species (50% of named species on Earth 🐨🐠🌻🦠) and found emergent properties capturing hidden regularities in nature. 🧵

thumb_up_off_alt268

chat_bubble_outline5

repeat57

shareShare

Jianyang Gu

@vimar_gu

2 months ago

It’s so exciting to see BioCLIP 2 demonstrates a biologically meaningful embedding space while only trained to distinguish species. Can’t wait to see more applications of BioCLIP 2 in solving real world problems. I’m attending #CVPR2025 in Nashville. Happy to chat about it!

thumb_up_off_alt12

chat_bubble_outline0

repeat6

shareShare

Yifei Li

@yifeilipku

2 months ago

📢 Introducing AutoSDT, a fully automatic pipeline that collects data-driven scientific coding tasks at scale! We use AutoSDT to collect AutoSDT-5K, enabling open co-scientist models that rival GPT-4o on ScienceAgentBench! Thread below ⬇️ (1/n)

thumb_up_off_alt72

chat_bubble_outline4

repeat25

shareShare

Saining Xie

@sainingxie

2 months ago

Had a great time at this CVPR community-building workshop---lots of fun discussions and some really important insights for early-career researchers. I also gave a talk on "Research as an Infinite Game." Here are the slides: canva.com/design/DAGp0iR…

thumb_up_off_alt347

chat_bubble_outline17

repeat60

shareShare

Botao Yu

@botaoyu24

2 months ago

🚀 A fantastic lineup of work in #AI4Science #DrugDiscovery #AI4Chemistry!

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Botao Yu

@botaoyu24

a month ago

Holy moly, what a massive effort, proud to be part of it! 🥳 As agentic search continues to evolve and increasingly support our work and daily lives, Mind2Web 2 arrives as a timely, rigorous benchmark for evaluation and progress tracking. (Now get to work, agent builders! This

thumb_up_off_alt15

chat_bubble_outline1

repeat0

shareShare

Botao Yu

@botaoyu24

22 days ago

⬇️ Check out SDE-Harness, our general framework for evaluating LLMs/agents on scientific discovery. It features easy integration, broad LLM support, dynamic prompting, comprehensive logging, and customizable metrics, applicable for all domains and tasks.

thumb_up_off_alt14

chat_bubble_outline0

repeat2

shareShare

elvis

@omarsar0

21 days ago

BREAKING: xAI announces Grok 4 "It can reason at a superhuman level!" Here is everything you need to know:

thumb_up_off_alt5,5K

chat_bubble_outline120

repeat409

shareShare

Huan Sun (OSU)

@hhsun1

15 days ago

🚨 Postdoc Hiring: I am looking for a postdoc to work on rigorously evaluating and advancing the capabilities and safety of computer-use agents (CUAs), co-advised with Yu Su OSU NLP Group. We welcome strong applicants with experience in CUAs, long-horizon reasoning/planning,

thumb_up_off_alt64

chat_bubble_outline1

repeat30

shareShare

Jianyang Gu

@vimar_gu

7 days ago

Announcing the NeurIPS Conference 2025 workshop on Imageomics: Discovering Biological Knowledge from Images Using AI! The workshop focuses on the interdisciplinary field between machine learning and biological science. We look forward to seeing you in San Diego! #NeurIPS2025

Announcing the <a href="/NeurIPSConf/">NeurIPS Conference</a> 2025 workshop on Imageomics:
Discovering Biological Knowledge from Images Using AI!

The workshop focuses on the interdisciplinary field between machine learning and biological science. We look forward to seeing you in San Diego!

#NeurIPS2025

thumb_up_off_alt23

chat_bubble_outline1

repeat15

shareShare

Yu Su @#ICLR2025

@ysu_nlp

2 days ago

Safety is one of the biggest blockers for computer use agents: how can I trust an agent won’t accidentally do something consequential without my permission? We collect and release the first large-scale dataset for detecting consequential actions on the web, and train the best

thumb_up_off_alt98

chat_bubble_outline0

repeat19

shareShare

Boyuan Zheng

@boyuan__zheng

2 days ago

Remember “Son of Anton” from the Silicon Valley show(Silicon Valley)? The experimental AI that “efficiently” orders 4,000 lbs of meat while looking for a cheap burger and “fixes” a bug by deleting all the code? It’s starting to look a lot like reality. Even 18 months ago, my own

Remember “Son of Anton” from the Silicon Valley show(<a href="/SiliconHBO/">Silicon Valley</a>)? The experimental AI that “efficiently” orders 4,000 lbs of meat while looking for a cheap burger and “fixes” a bug by deleting all the code?

It’s starting to look a lot like reality.

Even 18 months ago, my own

thumb_up_off_alt66

chat_bubble_outline0

repeat27

shareShare

Ben Blaiszik

@benblaiszik

a day ago

I'll be sitting down for a chat with Chenru Duan, founder of Deep Principle this afternoon. We'll be talking about topics including how to benchmark LLMs for scientific tasks and journey from academia to startup. Anything you'd like to hear about? x.com/chenru_duan/st…

thumb_up_off_alt19

chat_bubble_outline1

repeat4

shareShare

Botao Yu

Gate.io

Vardaan Pahuja

Zeyi Liao

Huan Sun (OSU)

Yu Su @#ICLR2025

Jianyang Gu

Yifei Li

Saining Xie

Botao Yu

Botao Yu

Botao Yu

elvis

Huan Sun (OSU)

Jianyang Gu

Yu Su @#ICLR2025

Boyuan Zheng

Ben Blaiszik