Fazl Barez (@fazlbarez) 's Twitter Profile
Fazl Barez

@fazlbarez

Making AI safe one Google doc at a time| Let's build AI's we can trust!

ID: 1341019917005537280

linkhttps://fbarez.github.io calendar_today21-12-2020 13:57:26

464 Tweet

1,1K Followers

729 Following

Maheep Chaudhary | महीप चौधरी💡 (@chaudharymaheep) 's Twitter Profile Photo

Do apply!!! Over the past few months, I have had the pleasure of working with Fazl, and it has been a highly productive experience. He is currently accepting students at Oxford for this summer.

Tingchen Fu (seek for PhD at 25 fall) (@tingchenfu) 's Twitter Profile Photo

1/🧵 New research alerted🚨: "Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models" (Paper:arxiv.org/pdf/2505.14810). We found a surprising paradox between instruction-following ability and reasoning ability. Here’s why ↓

1/🧵 New research alerted🚨: "Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models" (Paper:arxiv.org/pdf/2505.14810).

We found a surprising paradox between instruction-following ability and reasoning ability. Here’s why ↓
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

LLMs can produce harmful content, sometimes triggered by specific inputs or through hidden vulnerabilities. Safety-Net monitors LLMs' internal states in real-time to predict harmful outputs before generation occurs. It uses an unsupervised method, treating normal behavior as a

LLMs can produce harmful content, sometimes triggered by specific inputs or through hidden vulnerabilities.

Safety-Net monitors LLMs' internal states in real-time to predict harmful outputs before generation occurs.

It uses an unsupervised method, treating normal behavior as a
Xin Cynthia Chen (@xincynthiachen) 's Twitter Profile Photo

🎉 Announcing our ICML2025 Spotlight paper: Learning Safety Constraints for Large Language Models We introduce SaP (Safety Polytope) - a geometric approach to LLM safety that learns and enforces safety constraints in LLM's representation space, with interpretable insights. 🧵

🎉 Announcing our ICML2025 Spotlight paper: Learning Safety Constraints for Large Language Models

We introduce SaP (Safety Polytope) - a geometric approach to LLM safety that learns and enforces safety constraints in LLM's representation space, with interpretable insights.
🧵
David Krueger (@davidskrueger) 's Twitter Profile Photo

I will likely be looking for students at the University of Montreal / Mila to start January 2026. The deadline to apply is September 1, 2025. I will share more details later, but wanted to start getting it on people's radar!

David Duvenaud (@davidduvenaud) 's Twitter Profile Photo

It's hard to plan for AGI without knowing what outcomes are even possible, let alone good. So we’re hosting a workshop! Post-AGI Civilizational Equilibria: Are there any good ones? Vancouver, July 14th Featuring: Joe Carlsmith Richard Ngo Emmett Shear 🧵

It's hard to plan for AGI without knowing what outcomes are even possible, let alone good.  So we’re hosting a workshop!

Post-AGI Civilizational Equilibria: Are there any good ones?

Vancouver, July 14th

Featuring: <a href="/jkcarlsmith/">Joe Carlsmith</a> <a href="/RichardMCNgo/">Richard Ngo</a> <a href="/eshear/">Emmett Shear</a> 🧵