
Wenxiao Wang
@wenxiao__wang
CS phd student at UMD: ML robustness, AI security and privacy, representation learning
ID: 1489639849435049994
https://wangwenxiao.github.io 04-02-2022 16:40:21
99 Tweet
116 Followers
58 Following



🚨 Releasing the SCOTUS 2024 Legal Scenarios Benchmark 🚨 We’re excited to launch a new benchmark with 200+ realistic legal dilemmas from 2024 Supreme Court slip opinions—built using RELAI Data Agents. We tested top LLMs on legal reasoning: 🥇 o4-mini — 76.4% OpenAI Sam Altman




Nice clean application of certified robustness to detect test set contamination with provably low false positive rates. Way to go Yize Cheng Wenxiao Wang !

