Xander Davies (@alxndrdavies) Twitter Tweets • TwiCopy

Xander Davies

@alxndrdavies

+ Follow

technical staff @AISecurityInst | PhD student w @yaringal at @OATML_Oxford | prev @Harvard (haist.ai)

ID: 1244043124315508741

calendar_today28-03-2020 23:26:28

386 Tweet

1,1K Followers

647 Following

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

1/ The AI Security Institute research agenda is out - some highlights: AISI isn’t just asking what could go wrong with powerful AI systems. It’s focused on building the tools to get it right. A thread on the 3 pillars of its solutions work: alignment, control, and safeguards.

thumb_up_off_alt23

chat_bubble_outline1

repeat5

shareShare

Marie Davidsen Buhl

@mariebassbuhl

3 months ago

Can we massively scale up AI alignment research by identifying subproblems many people can work on in parallel? UK AISI’s alignment team is trying to do that. We’re starting with AI safety via debate - and we’ve just released our first paper🧵1/

thumb_up_off_alt85

chat_bubble_outline6

repeat13

shareShare

Geoffrey Irving

@geoffreyirving

2 months ago

Now all three of AISI's Safeguards, Control, and Alignment Teams have a paper sketching safety cases for technical mitigations, on top of our earlier sketch for inability arguments related to evals. :)

thumb_up_off_alt47

chat_bubble_outline2

repeat4

shareShare

Xander Davies

@alxndrdavies

2 months ago

Check out our new paper connecting safeguard evals to more end-to-end safety arguments!

thumb_up_off_alt22

chat_bubble_outline1

repeat1

shareShare

Sahar Abdelnabi 🕊 (on 🦋)

@sahar_abdelnabi

2 months ago

Hawthorne effect describes how study participants modify their behavior if they know they are being observed In our paper 📢, we study if LLMs exhibit analogous patterns🧠 Spoiler: they do⚠️ 🧵1/n

thumb_up_off_alt109

chat_bubble_outline3

repeat19

shareShare

Joe Benton

@joejbenton

a month ago

📰We've just released SHADE-Arena, a new set of sabotage evaluations. It's also one of the most complex, agentic (and imo highest quality) settings for control research to date! If you're interested in doing AI control or sabotage research, I highly recommend you check it out.

thumb_up_off_alt86

chat_bubble_outline1

repeat12

shareShare

Davis Brown

@davisbrownr

a month ago

New paper: real attackers don't jailbreak. Instead, they often use open-weight LLMs. For harder misuse tasks, they can use "decomposition attacks," where a misuse task is split into benign queries across new sessions. These answers help an unsafe model via in-context learning.

thumb_up_off_alt30

chat_bubble_outline1

repeat8

shareShare

Robert Kirk

@_robertkirk

25 days ago

New work out: We demonstrate a new attack against stacked safeguards and analyse defence in depth strategies. Excited for this joint collab between FAR.AI and AI Security Institute to be out!

thumb_up_off_alt19

chat_bubble_outline0

repeat4

shareShare

Xander Davies

@alxndrdavies

25 days ago

Excited about this work from FAR.AI x AI Security Institute !

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Xander Davies

@alxndrdavies

12 days ago

Exciting role on an important team!

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Ian Hogarth

@soundboy

10 days ago

Update from Xander Davies on the AI Security Institute's work on improving agent safeguards with OpenAI

thumb_up_off_alt17

chat_bubble_outline0

repeat4

shareShare

AI Security Institute

@aisecurityinst

9 days ago

We’re encouraged to see AISI’s safeguarding work recognised. As capabilities advance, it’s increasingly important to invest in testing and strengthening these protections.

thumb_up_off_alt26

chat_bubble_outline2

repeat3

shareShare

Hannah Rose Kirk

@hannahrosekirk

5 days ago

My team at AI Security Institute is hiring! This is an awesome opportunity to get involved with cutting-edge scientific research inside government on frontier AI models. I genuinely love my job and the team 🤗 Link: civilservicejobs.service.gov.uk/csr/jobs.cgi?j… More Info: ⬇️

thumb_up_off_alt109

chat_bubble_outline4

repeat24

shareShare