Usman Anwar (@usmananwar391) 's Twitter Profile
Usman Anwar

@usmananwar391

Deep Learning & AI Safety @Cambridge_uni

ID: 3623883973

linkhttp://uzman-anwar.github.io calendar_today20-09-2015 05:59:06

1,1K Tweet

678 Followers

1,1K Following

Ben Goldhaber (@bengoldhaber) 's Twitter Profile Photo

We're launching the FLF Incubator Fellowship on AI for Human Reasoning! 🚀 We're seeking talented researchers and builders to develop AI tools that enhance epistemics and coordination. 12 weeks, $25k-$50k stipend. Applications due June 9th.

Joschka Braun (@braunjoschka) 's Twitter Profile Photo

1/ Controlling LLMs with steering vectors is unreliable, but why?  Our paper, "Understanding (Un)Reliability of Steering Vectors in Language Models," at the #ICLR2025 Foundation Models in the Wild @ ICLR 2025 Workshop investigates this! What did we find?

Daniel Murfet (@danielmurfet) 's Twitter Profile Photo

A few months ago I resigned from my tenured position at the University of Melbourne and joined Timaeus as Director of Research. Timaeus is an AI safety non-profit research organisation. [1/n]🧵

Francesco Orabona (@bremen79) 's Twitter Profile Photo

As promised, we put on Arxiv the proof we did with Gemini. arxiv.org/pdf/2505.20219 This shows that the Polyak stepsize not only will not reach the optimum, but it can cycle, when used without the knowledge of f*. Gemini failed when prompted directly ("Find an example where the

As promised, we put on Arxiv the proof we did with Gemini. arxiv.org/pdf/2505.20219

This shows that the Polyak stepsize not only will not reach the optimum, but it can cycle, when used without the knowledge of f*.

Gemini failed when prompted directly ("Find an example where the
Shashwat Goel (@shashwatgoel7) 's Twitter Profile Photo

Confused about recent LLM RL results where models improve without any ground-truth signal? We were too. Until we looked at the reported numbers of the Pre-RL models and realized they were serverely underreported across papers. We compiled discrepancies in a blog below🧵👇

Confused about recent LLM RL results where models improve without any ground-truth signal? We were too. Until we looked at the reported numbers of the Pre-RL models and realized they were serverely underreported across papers. We compiled discrepancies in a blog below🧵👇
Xin Cynthia Chen (@xincynthiachen) 's Twitter Profile Photo

🎉 Announcing our ICML2025 Spotlight paper: Learning Safety Constraints for Large Language Models We introduce SaP (Safety Polytope) - a geometric approach to LLM safety that learns and enforces safety constraints in LLM's representation space, with interpretable insights. 🧵

🎉 Announcing our ICML2025 Spotlight paper: Learning Safety Constraints for Large Language Models

We introduce SaP (Safety Polytope) - a geometric approach to LLM safety that learns and enforces safety constraints in LLM's representation space, with interpretable insights.
🧵
Seán Ó hÉigeartaigh (@s_oheigeartaigh) 's Twitter Profile Photo

New working paper (pre-review), maybe my most important in recent years. I examine the evidence for the US-China race to AGI and decisive strategic advantage, & analyse the impact this narrative is having on our prospects for cooperation on safety. 1/5 papers.ssrn.com/abstract=52786…

Ekdeep Singh Lubana (@ekdeepl) 's Twitter Profile Photo

🚨 New paper alert! Linear representation hypothesis (LRH) argues concepts are encoded as **sparse sum of orthogonal directions**, motivating interpretability tools like SAEs. But what if some concepts don’t fit that mold? Would SAEs capture them? 🤔 1/11

Desi R. Ivanova (@desirivanova) 's Twitter Profile Photo

Apple's new pre-print "The illusion of thinking" is as problematic as that team's earlier GSM-Symbolic paper. Many might say "it's just a preprint". Well, despite clear flaws highlighted during the "peer review process" of the GSM-Symbolic paper, it was accepted at ICLR 2025. 1/5

Atoosa Kasirzadeh (@dr_atoosa) 's Twitter Profile Photo

Ever since I first heard the slogan “AI as normal technology,” I’ve felt uneasy. Tonight that unease crystallised. In this 🧵I unpack what normal hides and why the metaphor may ultimately fail to capture AI’s abnormal impacts on human life and societies. 1/n

Ilia Sucholutsky (@sucholutsky) 's Twitter Profile Photo

🚨 New preprint: "Identifying, Evaluating, and Mitigating Risks of AI Thought Partnerships"! 🚨 We propose a framework for understanding unique risks posed by AI thought partners (AITPs)—AI systems that collaborate with humans on complex reasoning tasks, not just simple tool use

🚨 New preprint: "Identifying, Evaluating, and Mitigating Risks of AI Thought Partnerships"! 🚨 We propose a framework for understanding unique risks posed by AI thought partners (AITPs)—AI systems that collaborate with humans on complex reasoning tasks, not just simple tool use
Sumeet Motwani (@sumeetrm) 's Twitter Profile Photo

Interested in shaping the next generation of safe AI? SoLaR@COLM 2025 is looking for paper submissions and reviewers! 🤖 ML track: algorithms, math, computation 📚 Socio-technical track: policy, ethics, human participant research Submit your paper or sign up below to review by

Interested in shaping the next generation of safe AI? SoLaR@COLM 2025 is looking for paper submissions and reviewers!

🤖 ML track: algorithms, math, computation 
📚 Socio-technical track: policy, ethics, human participant research

Submit your paper or sign up below to review by
Aidan Kierans (@aidankierans) 's Twitter Profile Photo

🤖 Calling all philosophers and AI researchers! Our team at UConn Computer Science & Engineering's RIET Lab is hosting a virtual workshop on Machine Ethics and Reasoning (MERe) on July 18, 2025. We're bringing together philosophy PhDs, CS researchers & AI folks to advance computational moral reasoning 🧵

Richard Ngo (@richardmcngo) 's Twitter Profile Photo

I’m concerned that the AI safety community’s focus on short timelines is causing it to operate in a counterproductive scarcity mindset. I’d prefer a portfolio approach where individuals focus on more or less urgent scenarios in proportion to how emotionally resilient they are.

xuan (ɕɥɛn / sh-yen) (@xuanalogue) 's Twitter Profile Photo

For that reason, there already is economic pressure towards "value collapse" (cf. C. Thi Nguyen) -- reshaping people's wants & ideals into easily produced & satisfied forms -- and we might well expect to increase as other parts of the economy become automated.

Karim Abdel Sadek (@karim_abdelll) 's Twitter Profile Photo

*New AI Alignment Paper* 🚨 Goal misgeneralization occurs when AI agents learn the wrong reward function, instead of the human's intended goal. 😇 We show that training with a minimax regret objective provably mitigates it, promoting safer and better-aligned RL policies!

Keyon Vafa (@keyonv) 's Twitter Profile Photo

Can an AI model predict perfectly and still have a terrible world model? What would that even mean? Our new ICML paper formalizes these questions One result tells the story: A transformer trained on 10M solar systems nails planetary orbits. But it botches gravitational laws 🧵