Usman Anwar (@usmananwar391) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

We're launching the FLF Incubator Fellowship on AI for Human Reasoning! 🚀 We're seeking talented researchers and builders to develop AI tools that enhance epistemics and coordination. 12 weeks, $25k-$50k stipend. Applications due June 9th.

thumb_up_off_alt75

chat_bubble_outline6

repeat16

shareShare

Joschka Braun

@braunjoschka

2 months ago

1/ Controlling LLMs with steering vectors is unreliable, but why? Our paper, "Understanding (Un)Reliability of Steering Vectors in Language Models," at the #ICLR2025 Foundation Models in the Wild @ ICLR 2025 Workshop investigates this! What did we find?

thumb_up_off_alt19

chat_bubble_outline1

repeat6

shareShare

Daniel Murfet

@danielmurfet

2 months ago

A few months ago I resigned from my tenured position at the University of Melbourne and joined Timaeus as Director of Research. Timaeus is an AI safety non-profit research organisation. [1/n]🧵

thumb_up_off_alt213

chat_bubble_outline9

repeat23

shareShare

Francesco Orabona

@bremen79

2 months ago

As promised, we put on Arxiv the proof we did with Gemini. arxiv.org/pdf/2505.20219 This shows that the Polyak stepsize not only will not reach the optimum, but it can cycle, when used without the knowledge of f*. Gemini failed when prompted directly ("Find an example where the

thumb_up_off_alt425

chat_bubble_outline5

repeat61

shareShare

Shashwat Goel

@shashwatgoel7

2 months ago

Confused about recent LLM RL results where models improve without any ground-truth signal? We were too. Until we looked at the reported numbers of the Pre-RL models and realized they were serverely underreported across papers. We compiled discrepancies in a blog below🧵👇

thumb_up_off_alt836

chat_bubble_outline33

repeat120

shareShare

Xin Cynthia Chen

@xincynthiachen

a month ago

🎉 Announcing our ICML2025 Spotlight paper: Learning Safety Constraints for Large Language Models We introduce SaP (Safety Polytope) - a geometric approach to LLM safety that learns and enforces safety constraints in LLM's representation space, with interpretable insights. 🧵

thumb_up_off_alt231

chat_bubble_outline4

repeat41

shareShare

Seán Ó hÉigeartaigh

@s_oheigeartaigh

a month ago

New working paper (pre-review), maybe my most important in recent years. I examine the evidence for the US-China race to AGI and decisive strategic advantage, & analyse the impact this narrative is having on our prospects for cooperation on safety. 1/5 papers.ssrn.com/abstract=52786…

thumb_up_off_alt100

chat_bubble_outline4

repeat31

shareShare

Ekdeep Singh Lubana

@ekdeepl

a month ago

🚨 New paper alert! Linear representation hypothesis (LRH) argues concepts are encoded as **sparse sum of orthogonal directions**, motivating interpretability tools like SAEs. But what if some concepts don’t fit that mold? Would SAEs capture them? 🤔 1/11

thumb_up_off_alt378

chat_bubble_outline5

repeat60

shareShare

Desi R. Ivanova

@desirivanova

a month ago

Apple's new pre-print "The illusion of thinking" is as problematic as that team's earlier GSM-Symbolic paper. Many might say "it's just a preprint". Well, despite clear flaws highlighted during the "peer review process" of the GSM-Symbolic paper, it was accepted at ICLR 2025. 1/5

thumb_up_off_alt11

chat_bubble_outline1

repeat1

shareShare

Atoosa Kasirzadeh

@dr_atoosa

a month ago

Ever since I first heard the slogan “AI as normal technology,” I’ve felt uneasy. Tonight that unease crystallised. In this 🧵I unpack what normal hides and why the metaphor may ultimately fail to capture AI’s abnormal impacts on human life and societies. 1/n

thumb_up_off_alt112

chat_bubble_outline6

repeat15

shareShare

Ilia Sucholutsky

@sucholutsky

a month ago

🚨 New preprint: "Identifying, Evaluating, and Mitigating Risks of AI Thought Partnerships"! 🚨 We propose a framework for understanding unique risks posed by AI thought partners (AITPs)—AI systems that collaborate with humans on complex reasoning tasks, not just simple tool use

thumb_up_off_alt26

chat_bubble_outline1

repeat9

shareShare

Sumeet Motwani

@sumeetrm

a month ago

Interested in shaping the next generation of safe AI? SoLaR@COLM 2025 is looking for paper submissions and reviewers! 🤖 ML track: algorithms, math, computation 📚 Socio-technical track: policy, ethics, human participant research Submit your paper or sign up below to review by

thumb_up_off_alt8

chat_bubble_outline1

repeat2

shareShare

Aidan Kierans

@aidankierans

16 days ago

🤖 Calling all philosophers and AI researchers! Our team at UConn Computer Science & Engineering's RIET Lab is hosting a virtual workshop on Machine Ethics and Reasoning (MERe) on July 18, 2025. We're bringing together philosophy PhDs, CS researchers & AI folks to advance computational moral reasoning 🧵

thumb_up_off_alt6

chat_bubble_outline1

repeat3

shareShare

Richard Ngo

@richardmcngo

14 days ago

I’m concerned that the AI safety community’s focus on short timelines is causing it to operate in a counterproductive scarcity mindset. I’d prefer a portfolio approach where individuals focus on more or less urgent scenarios in proportion to how emotionally resilient they are.

thumb_up_off_alt290

chat_bubble_outline9

repeat13

shareShare

xuan (ɕɥɛn / sh-yen)

@xuanalogue

11 days ago

For that reason, there already is economic pressure towards "value collapse" (cf. C. Thi Nguyen) -- reshaping people's wants & ideals into easily produced & satisfied forms -- and we might well expect to increase as other parts of the economy become automated.

thumb_up_off_alt28

chat_bubble_outline3

repeat3

shareShare

Karim Abdel Sadek

@karim_abdelll

9 days ago

*New AI Alignment Paper* 🚨 Goal misgeneralization occurs when AI agents learn the wrong reward function, instead of the human's intended goal. 😇 We show that training with a minimax regret objective provably mitigates it, promoting safer and better-aligned RL policies!

thumb_up_off_alt136

chat_bubble_outline9

repeat27

shareShare

Keyon Vafa

@keyonv

6 days ago

Can an AI model predict perfectly and still have a terrible world model? What would that even mean? Our new ICML paper formalizes these questions One result tells the story: A transformer trained on 10M solar systems nails planetary orbits. But it botches gravitational laws 🧵

thumb_up_off_alt6,6K

chat_bubble_outline198

repeat938

shareShare

Usman Anwar

Gate.io

Ben Goldhaber

Joschka Braun

Daniel Murfet

Francesco Orabona

Shashwat Goel

Xin Cynthia Chen

Seán Ó hÉigeartaigh

Ekdeep Singh Lubana

Desi R. Ivanova

Atoosa Kasirzadeh

Ilia Sucholutsky

Sumeet Motwani

Aidan Kierans

Richard Ngo

xuan (ɕɥɛn / sh-yen)

Karim Abdel Sadek

Keyon Vafa