
UW NLP
@uwnlp
The NLP group at the University of Washington.
ID: 3716745856
20-09-2015 10:26:25
1,1K Tweet
12,12K Followers
170 Following




Cracking the ๐ฆ๐ฎ๐ฅ๐ญ๐ข-๐ญ๐ฎ๐ซ๐ง safety challenge! โก๏ธ๐-๐๐๐๐ฆ๐ข๐ง๐ โก๏ธ is a scalable red-teaming framework revealing diverse multi-turn LM vulnerabilities. Sneak peek: 96.2% attack success on Claude 3.7โdespite its single-turn robustness & the largest multi-turn safety dataset!

Time to stress-test your AI agents โ say hello to DoomArena ๐๐ค A modular framework to red-team AI agents in realistic threat settings. Plug in attacks, swap threat models, and see what breaks. Built for adaptability, designed for chaos. Live now ๐ง๐ต๏ธโโ๏ธ๐ฅ: github.com/ServiceNow/Dooโฆ


Excited to be at #ICLR2025 ๐คฉ I'll be giving an oral presentation for Creativity Index on Fri 25th 11:06, Garnet 212&219 ๐๏ธ I'll also be presenting posters: ๐ExploreToM, Sat 26th 10:00, Hall 3 + 2B #49 ๐CreativityIndex, Fri 25th 10:30, Hall 3 + 2B #618 Hope to see you there!




Thrilled to announce that I will be joining UT Austin Computer Science at UT Austin as an assistant professor in fall 2026! I will continue working on language models, data challenges, learning paradigms, & AI for innovation. Looking forward to teaming up with new students & colleagues! ๐ค ๐ค


Data curation is crucial for LLM reasoning, but how do we know if our dataset is not overfit to one benchmark and generalizes to unseen distributions? ๐ค ๐๐๐ญ๐ ๐๐ข๐ฏ๐๐ซ๐ฌ๐ข๐ญ๐ฒ is key, when measured correctโit strongly predicts model generalization in reasoning tasks! ๐งต





๐ก๏ธ We present ๐๐๐ฅ๐-๐๐๐๐๐๐๐ฆ, a ๐๐ฎ๐ฅ๐ฅ๐ฒ ๐จ๐ง๐ฅ๐ข๐ง๐ ๐ฌ๐๐ฅ๐-๐ฉ๐ฅ๐๐ฒ ๐ฆ๐ฎ๐ฅ๐ญ๐ข-๐๐ ๐๐ง๐ญ ๐ซ๐๐ข๐ง๐๐จ๐ซ๐๐๐ฆ๐๐ง๐ญ ๐ฅ๐๐๐ซ๐ง๐ข๐ง๐ (๐๐๐๐) ๐๐ฅ๐ ๐จ๐ซ๐ข๐ญ๐ก๐ฆ that co-evolves an Attacker and a Defenderโboth played by the same LM policyโin a continuous training
