
Edoardo Debenedetti
@edoardo_debe
PhD student @CSatETH 🇨🇭 | AI Security and Privacy 😈🤖 | Help 🇺🇦 on standforukraine.com | From 🇪🇺🇮🇹 | prev. Student Researcher at @google
ID: 789510059915153408
http://edoardo.science 21-10-2016 16:54:01
920 Tweet
1,1K Followers
1,1K Following


Shipment arrived on time. All non-sleepy members of SPY Lab are now in Singapore. Come meet us! Edoardo Debenedetti Daniel Paleka Michael Aerni Jie Zhang Kristina Nikolic




Thanks Center for AI Safety for the generous prize! AgentDojo is the reference for evaluating prompt injections in LLM agents, and is used for red-teaming at many frontier labs. I had a blast working on this with Edoardo Debenedetti Jie Zhang Marc Fischer Luca Beurer-Kellner Mislav Balunović

So stoked for the recognition that AgentDojo got by winning a SafeBench first prize! A big thank you to Center for AI Safety and the prize judges. Creating this with Jie Zhang Luca Beurer-Kellner Marc Fischer Mislav Balunović Florian Tramèr was amazing! Check out the thread to learn more

The Jailbreak Tax got a Spotlight award ICML Conference see you in Vancouver!


Anthropic is really lucky to get Javier Rando, we'll miss him at SPY Lab!

Following on Andrej Karpathy's vision of software 2.0, we've been thinking about *malware 2.0*: malicious programs augmented with LLMs. In a new paper, we study malware 2.0 from one particular angle: how could LLMs change the way in which hackers monetize exploits?



Our new Google DeepMind paper, "Lessons from Defending Gemini Against Indirect Prompt Injections," details our framework for evaluating and improving robustness to prompt injection attacks.




