Alexandra Souly (@alexandrasouly) Twitter Tweets • TwiCopy

Alexandra Souly

2 years ago

Looking forward to present our work Leading the Pack: N-player Opponent Shaping tomorrow at the Multi-Agent Security Workshop #NeurIPS23 MASec Workshop Paper: openreview.net/pdf?id=3b8hfpq… Thanks to my co-authors Timon Willi akbir. Robert Kirk Chris Lu Edward Grefenstette Tim Rocktäschel

Looking forward to present our work Leading the Pack: N-player Opponent Shaping tomorrow at the Multi-Agent Security Workshop #NeurIPS23 <a href="/masecworkshop/">MASec Workshop</a>
Paper: openreview.net/pdf?id=3b8hfpq…
Thanks to my co-authors <a href="/TimonWilli/">Timon Willi</a> <a href="/akbirkhan/">akbir.</a> <a href="/_robertkirk/">Robert Kirk</a> <a href="/_chris_lu_/">Chris Lu</a> <a href="/egrefen/">Edward Grefenstette</a> <a href="/_rockt/">Tim Rocktäschel</a>

thumb_up_off_alt18

chat_bubble_outline1

repeat6

shareShare

UCL DARK

@ucl_dark

2 years ago

Exciting day ahead! - Roberta Raileanu's talk on ICL for sequential decision-making tasks at 4pm (238-239) - an oral by Alexandra Souly on N-player opponent shaping at 10:40AM (223), - The SoLaR @ NeurIPS2024 workshop (R06-R09) - A poster at 8:15AM on generalisation in offline RL (238-239)

thumb_up_off_alt3

chat_bubble_outline0

repeat2

shareShare

Dominik Schmidt

@schmidtdominik_

2 years ago

Extremely excited to announce new work (w/ Minqi Jiang) on learning RL policies and world models purely from action-free videos. 🌶️🌶️ LAPO learns a latent representation for actions from observation alone and then derives a policy from it. Paper: arxiv.org/abs/2312.10812

Extremely excited to announce new work (w/ <a href="/MinqiJiang/">Minqi Jiang</a>) on learning RL policies and world models purely from action-free videos. 🌶️🌶️

LAPO learns a latent representation for actions from observation alone and then derives a policy from it.

Paper: arxiv.org/abs/2312.10812

thumb_up_off_alt211

chat_bubble_outline9

repeat38

shareShare

Edward Grefenstette

@egrefen

2 years ago

Opponent Shaping allows agents to learn to cooperate. Sounds nice, but do these methods scale past two agents? If not, why not, and what can be done? Alexandra Souly, Timon Willi, akbir. and colleagues answer these questions and more [12/24] openreview.net/forum?id=3b8hf…

Opponent Shaping allows agents to learn to cooperate. Sounds nice, but do these methods scale past two agents? If not, why not, and what can be done? <a href="/AlexandraSouly/">Alexandra Souly</a>, <a href="/TimonWilli/">Timon Willi</a>, <a href="/akbirkhan/">akbir.</a> and colleagues answer these questions and more [12/24]

openreview.net/forum?id=3b8hf…

thumb_up_off_alt6

chat_bubble_outline1

repeat3

shareShare

Dominik Schmidt

@schmidtdominik_

2 years ago

The code + new results for LAPO, an ⚡ICLR Spotlight⚡ (w/ Minqi Jiang) are now out ‼️ LAPO learns world models and policies directly from video, without any action labels, enabling training of agents from web-scale video data alone. Links below ⤵️

thumb_up_off_alt161

chat_bubble_outline3

repeat29

shareShare

Xander Davies

@alxndrdavies

a year ago

Jailbreaking evals ~always focus on simple chatbots—excited to announce AgentHarm, a dataset for measuring harmfulness of LLM 𝑎𝑔𝑒𝑛𝑡𝑠 developed at @AISafetyInst in collaboration with Gray Swan AI! 🧵 1/N

thumb_up_off_alt189

chat_bubble_outline5

repeat40

shareShare

AI Security Institute

@aisecurityinst

a year ago

We've released a technical report detailing our pre-deployment testing of Anthropic's upgraded Claude 3.5 Model with the U.S. AI Safety Institute. Read our blog for a high-level overview. aisi.gov.uk/work/pre-deplo…

thumb_up_off_alt146

chat_bubble_outline1

repeat22

shareShare

Maksym Andriushchenko @ ICLR

@maksym_andr

a year ago

Great to see that AgentHarm (arxiv.org/abs/2410.09024) has been used by the US and UK AI Safety Institutes for pre-deployment testing of the upgraded Claude 3.5 Sonnet. Also, check out the full report—it's great and will likely influence evaluation standards for new LLMs, as

thumb_up_off_alt31

chat_bubble_outline3

repeat3

shareShare

Maksym Andriushchenko @ ICLR

@maksym_andr

a year ago

Great to see our AgentHarm benchmark mentioned here as an example evaluation of frontier AI systems! openai.com/index/early-ac…

thumb_up_off_alt21

chat_bubble_outline1

repeat3

shareShare

Xander Davies

@alxndrdavies

9 months ago

When we were developing our agent misuse dataset, we noticed instances of models seeming to realize our tasks were fake. We're sharing some examples and we'd be excited for more research into how synthetic tasks can distort eval results! 🧵 1/N

thumb_up_off_alt93

chat_bubble_outline3

repeat14

shareShare

Xander Davies

@alxndrdavies

9 months ago

Defending against adversarial prompts is hard; defending against fine-tuning API attacks is much harder. In our new AI Security Institute pre-print, we break alignment and extract harmful info using entirely benign and natural interactions during fine-tuning & inference. 😮 🧵 1/10

Defending against adversarial prompts is hard; defending against fine-tuning API attacks is much harder. In our new <a href="/AISecurityInst/">AI Security Institute</a> pre-print, we break alignment and extract harmful info using entirely benign and natural interactions during fine-tuning & inference. 😮 🧵 1/10

thumb_up_off_alt127

chat_bubble_outline3

repeat22

shareShare

Micah Goldblum

@micahgoldblum

4 months ago

🚨 Did you know that small-batch vanilla SGD without momentum (i.e. the first optimizer you learn about in intro ML) is virtually as fast as AdamW for LLM pretraining on a per-FLOP basis? 📜 1/n

thumb_up_off_alt736

chat_bubble_outline22

repeat92

shareShare

Xander Davies

@alxndrdavies

4 months ago

We at AI Security Institute worked with OpenAI to test & improve Agent’s safeguards prior to release. A few notes on our experience🧵 1/4

We at <a href="/AISecurityInst/">AI Security Institute</a> worked with <a href="/OpenAI/">OpenAI</a> to test & improve Agent’s safeguards prior to release. A few notes on our experience🧵 1/4

thumb_up_off_alt135

chat_bubble_outline3

repeat24

shareShare