Edoardo Debenedetti (@edoardo_debe) 's Twitter Profile
Edoardo Debenedetti

@edoardo_debe

PhD student @CSatETH 🇨🇭 | AI Security and Privacy 😈🤖 | Help 🇺🇦 on standforukraine.com | From 🇪🇺🇮🇹 | prev. Student Researcher at @google

ID: 789510059915153408

linkhttp://edoardo.science calendar_today21-10-2016 16:54:01

920 Tweet

1,1K Followers

1,1K Following

Javier Rando @ ICLR (@javirandor) 's Twitter Profile Photo

Presenting 2 posters today at ICLR. Come check them out! 10am ➡️ #502: Scalable Extraction of Training Data from Aligned, Production Language Models 3pm ➡️ #324: Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI

Kristina Nikolic @ ICLR '25 (@nkristina01_) 's Twitter Profile Photo

The oral presentation of the jailbreak tax is tomorrow at 4:20pm in Hall 4 #6. The poster is up from 5pm. See you at ICLR Building Trust in LLMs Workshop. ICLR 2026

Florian Tramèr (@florian_tramer) 's Twitter Profile Photo

Thanks Center for AI Safety for the generous prize! AgentDojo is the reference for evaluating prompt injections in LLM agents, and is used for red-teaming at many frontier labs. I had a blast working on this with Edoardo Debenedetti Jie Zhang Marc Fischer Luca Beurer-Kellner Mislav Balunović

Edoardo Debenedetti (@edoardo_debe) 's Twitter Profile Photo

So stoked for the recognition that AgentDojo got by winning a SafeBench first prize! A big thank you to Center for AI Safety and the prize judges. Creating this with Jie Zhang Luca Beurer-Kellner Marc Fischer Mislav Balunović Florian Tramèr was amazing! Check out the thread to learn more

Javier Rando @ ICLR (@javirandor) 's Twitter Profile Photo

AutoAdvExBench was accepted as a spotlight at ICML. We agree it is a great paper! 😋 I would love to see more evaluations of LLMs performing real-world tasks with security implications.

Florian Tramèr (@florian_tramer) 's Twitter Profile Photo

Following on Andrej Karpathy's vision of software 2.0, we've been thinking about *malware 2.0*: malicious programs augmented with LLMs. In a new paper, we study malware 2.0 from one particular angle: how could LLMs change the way in which hackers monetize exploits?

Following on <a href="/karpathy/">Andrej Karpathy</a>'s vision of software 2.0, we've been thinking about *malware 2.0*: malicious programs augmented with LLMs.

In a new paper, we study malware 2.0 from one particular angle: how could LLMs change the way in which hackers monetize exploits?
Ilia Shumailov🦔 (@iliaishacked) 's Twitter Profile Photo

Our new Google DeepMind paper, "Lessons from Defending Gemini Against Indirect Prompt Injections," details our framework for evaluating and improving robustness to prompt injection attacks.

Our new  <a href="/GoogleDeepMind/">Google DeepMind</a> paper, "Lessons from Defending Gemini Against Indirect Prompt Injections," details our framework for evaluating and improving robustness to prompt injection attacks.
Simon Willison (@simonw) 's Twitter Profile Photo

"Design Patterns for Securing LLM Agents against Prompt Injections" is an excellent new paper that provides six design patterns to help protect LLM tool-using systems (call them "agents" if you like) against prompt injection attacks

"Design Patterns for Securing LLM Agents against Prompt Injections" is an excellent new paper that provides six design patterns to help protect LLM tool-using systems (call them "agents" if you like) against prompt injection attacks
Edoardo Debenedetti (@edoardo_debe) 's Twitter Profile Photo

We recently updated the CaMeL paper, with results on new models (which improve utility a lot with zero changes!). Most importantly, we released code with it. Go have a look if you're curious to find out more details! Paper: arxiv.org/abs/2503.18813 Code: github.com/google-researc…

Simon Willison (@simonw) 's Twitter Profile Photo

This is very exciting! The one thing I really missed from the CaMeL paper was example code implementing the pattern, now here it is