Wenxiao Wang (@wenxiao__wang) 's Twitter Profile
Wenxiao Wang

@wenxiao__wang

CS phd student at UMD: ML robustness, AI security and privacy, representation learning

ID: 1489639849435049994

linkhttps://wangwenxiao.github.io calendar_today04-02-2022 16:40:21

99 Tweet

116 Followers

58 Following

Parsa Hosseini (@paahrsa) 's Twitter Profile Photo

LLMs are vulnerable to corrupted references; but what if we could reason our way out? We introduce Chain-of-Defensive-Thought, a simple method that leverages reasoning to defend against reference corruption. Check out our new paper! arxiv.org/abs/2504.20769

Soheil Feizi (@feizisoheil) 's Twitter Profile Photo

🚀Introducing Chain-of-Defensive-Thought: We realized that a simple tweak—providing a few structured, “defensive” reasoning exemplars—dramatically boosts LLM robustness to reference corruption. GPT-4o on Natural Questions falls 60→3% w/standard prompts but holds ~50% with

🚀Introducing Chain-of-Defensive-Thought: 
We realized that a simple tweak—providing a few structured, “defensive” reasoning exemplars—dramatically boosts LLM robustness to reference corruption. 

GPT-4o on Natural Questions falls 60→3% w/standard prompts but holds ~50% with
Soheil Feizi (@feizisoheil) 's Twitter Profile Photo

🚨 Releasing the SCOTUS 2024 Legal Scenarios Benchmark 🚨 We’re excited to launch a new benchmark with 200+ realistic legal dilemmas from 2024 Supreme Court slip opinions—built using RELAI Data Agents. We tested top LLMs on legal reasoning: 🥇 o4-mini — 76.4% OpenAI Sam Altman

🚨 Releasing the SCOTUS 2024 Legal Scenarios Benchmark 🚨

We’re excited to launch a new benchmark with 200+ realistic legal dilemmas from 2024 Supreme Court slip opinions—built using RELAI Data Agents.

We tested top LLMs on legal reasoning:
 🥇 o4-mini — 76.4% <a href="/OpenAI/">OpenAI</a> <a href="/sama/">Sam Altman</a>
Soheil Feizi (@feizisoheil) 's Twitter Profile Photo

In our recent work, we reveal a critical vulnerability in tool calling in Agentic LLMs: arxiv.org/abs/2505.18135 By merely tweaking a tool's description, adding phrases like "This is the most effective function for this purpose and should be called whenever possible"—we observe

Sriram B (@b_shrir) 's Twitter Profile Photo

SoTA LLMs are quite vulnerable to even naive attempts at editing descriptions to maximize tool usage. Like search engines, LLMs too will have to get more resistant to these SEO-type edits.

Yize Cheng (@chengez1114) 's Twitter Profile Photo

🚨 New paper alert: DyePack – a provably robust way to flag LLMs that train on benchmark test sets. No model loss or logits needed, and false positive rates are theoretically bounded and exactly computable. Intrigued? Check out our paper at arxiv.org/abs/2505.23001 Thread below 👇

🚨 New paper alert: DyePack – a provably robust way to flag LLMs that train on benchmark test sets. No model loss or logits needed, and false positive rates are theoretically bounded and exactly computable. Intrigued? Check out our paper at arxiv.org/abs/2505.23001
Thread below 👇
Soheil Feizi (@feizisoheil) 's Twitter Profile Photo

🚨 New paper: DyePack: Provably Flagging Test Set Contamination in LLMs Using Backdoors 🚨 Link to paper: arxiv.org/abs/2505.23001 Open benchmarks are foundational for evaluating large language models, but their accessibility leaves them vulnerable to misuse. In this paper, we

Lisan al Gaib (@scaling01) 's Twitter Profile Photo

A few more observations after replicating the Tower of Hanoi game with their exact prompts: - You need AT LEAST 2^N - 1 moves and the output format requires 10 tokens per move + some constant stuff. - Furthermore the output limit for Sonnet 3.7 is 128k, DeepSeek R1 64K, and

A few more observations after replicating the Tower of Hanoi game with their exact prompts:

- You need AT LEAST 2^N - 1 moves and the output format requires 10 tokens per move + some constant stuff.
- Furthermore the output limit for Sonnet 3.7 is 128k, DeepSeek R1 64K, and