Robert Kirk (@_robertkirk) Twitter Tweets • TwiCopy

Robert Kirk

@_robertkirk

+ Follow

Research Scientist at @AISecurityInst; PhD Student @ucl_dark. LLMs, AI Safety, Generalisation; @Effect_altruism

ID: 1219002945246834688

linkhttps://robertkirk.github.io/ calendar_today19-01-2020 21:05:31

335 Tweet

1,1K Followers

268 Following

Tim Rocktäschel

@_rockt

4 months ago

Our UCL DARK MSc student Yi Xu managed to get his work accepted as a spotlight paper at ICML Conference 2025 (top 2.6% submissions) 🚀 What an amazing success testament to the outstanding supervision by Robert Kirk and Laura Ruis.

thumb_up_off_alt67

chat_bubble_outline1

repeat6

shareShare

AI Security Institute

@aisecurityinst

4 months ago

🧵 Today we’re publishing our first Research Agenda – a detailed outline of the most urgent questions we’re working to answer as AI capabilities grow. It’s our roadmap for tackling the hardest technical challenges in AI security.

thumb_up_off_alt125

chat_bubble_outline5

repeat52

shareShare

Geoffrey Irving

@geoffreyirving

3 months ago

We wrote out a very speculative safety case sketch for low-stakes alignment, based on safe-but-intractable computations using humans, scalable oversight, and learning theory + exploration guarantees. It does not work yet; the goal is to find and clarify alignment subproblems. 🧵

thumb_up_off_alt35

chat_bubble_outline2

repeat8

shareShare

AI Security Institute

@aisecurityinst

3 months ago

We’ve written a safety case for safeguards against misuse, including a methodology for connecting the results of safeguard evaluations to risk estimates🛡️ This helps make safeguard evaluations actionable, which is increasingly important as AI systems increase in capability.

thumb_up_off_alt56

chat_bubble_outline1

repeat10

shareShare

Robert Kirk

@_robertkirk

3 months ago

New paper! With Joshua Clymer, Jonah Weinbaum and others, we’ve written a safety case for safeguards against misuse. We lay out how developers can connect safeguard evaluation results to real-world decisions about how to deploy models. 🧵

New paper! With <a href="/joshua_clymer/">Joshua Clymer</a>, Jonah Weinbaum and others, we’ve written a safety case for safeguards against misuse. We lay out how developers can connect safeguard evaluation results to real-world decisions about how to deploy models. 🧵

thumb_up_off_alt45

chat_bubble_outline4

repeat9

shareShare

Geoffrey Irving

@geoffreyirving

3 months ago

Now all three of AISI's Safeguards, Control, and Alignment Teams have a paper sketching safety cases for technical mitigations, on top of our earlier sketch for inability arguments related to evals. :)

thumb_up_off_alt47

chat_bubble_outline2

repeat4

shareShare

Sahar Abdelnabi 🕊 (on 🦋)

@sahar_abdelnabi

3 months ago

Hawthorne effect describes how study participants modify their behavior if they know they are being observed In our paper 📢, we study if LLMs exhibit analogous patterns🧠 Spoiler: they do⚠️ 🧵1/n

thumb_up_off_alt109

chat_bubble_outline3

repeat19

shareShare

Avi Schwarzschild

@a_v_i__s

2 months ago

Ever tried to tell if someone really forgot your birthday? ... evaluating forgetting is tricky. Now imagine doing that… but for an LLM… with privacy on the line. We studied how to evaluate machine unlearning, and we found some problems. 🧵

thumb_up_off_alt29

chat_bubble_outline1

repeat8

shareShare

Rylan Schaeffer

@rylanschaeffer

2 months ago

A bit late to the party, but our paper on predictable inference-time / test-time scaling was accepted to #icml2025 🎉🎉🎉 TLDR: Best of N was shown to exhibit power (polynomial) law scaling (left), but maths suggest one should expect exponential scaling (center). We show how to

thumb_up_off_alt106

chat_bubble_outline4

repeat14

shareShare