Hamed Hassani (@hamedshassani) 's Twitter Profile
Hamed Hassani

@hamedshassani

Associate Professor, University of Pennsylvania @PENN, Machine Learning, Information Theory

ID: 1233116922155155457

linkhttps://www.seas.upenn.edu/~hassani/ calendar_today27-02-2020 19:49:27

150 Tweet

2,2K Followers

749 Following

Reza Shokri (@rzshokri) 's Twitter Profile Photo

Watermark Smoothing Attacks against Language Models: arxiv.org/abs/2407.14206. LLM watermarks as statistical perturbations to token probabilities can be removed *without* significantly reducing text quality. Adversary having access to a weaker model can infer and smooth watermarks

The Nobel Prize (@nobelprize) 's Twitter Profile Photo

BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Physics to John J. Hopfield and Geoffrey E. Hinton “for foundational discoveries and inventions that enable machine learning with artificial neural networks.”

BREAKING NEWS
The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Physics to John J. Hopfield and Geoffrey E. Hinton “for foundational discoveries and inventions that enable machine learning with artificial neural networks.”
Alex Robey (@alexrobey23) 's Twitter Profile Photo

I'm grateful to have received the Adversarial ML Rising Star Award! 🚀 AdvMLFrontiers is a fantastic venue. Many thanks to the award committee Pin-Yu Chen Bo Li sijia.liu Cho-Jui Hsieh and to the workshop organizers!

Russ Salakhutdinov (@rsalakhu) 's Twitter Profile Photo

Jailbreaking LLM-Controlled Robots – Machine Learning Blog | ML@CMU | Carnegie Mellon University blog.ml.cmu.edu/2024/10/29/jai…

IEEE Spectrum (@ieeespectrum) 's Twitter Profile Photo

New research shows that AI-driven robots can be easily jailbroken and tricked into doing harmful or dangerous tasks. spectrum.ieee.org/jailbreak-llm?…

Alex Robey (@alexrobey23) 's Twitter Profile Photo

In around an hour (at 3:45pm PST), I'll be giving a talk about jailbreaking LLM-controlled robots at the AdvML workshop at #NeurIPS2024 in East Ballroom C. I'll be at the poster session directly afterward as well if anyone wants to chat about this work! 🤖

Amin Karbasi (@aminkarbasi) 's Twitter Profile Photo

🔥🔥🔥 Adversarial reasoning is born. Hot take: The core problem we address in this paper is the role of reasoning in AI safety. While there have been recent efforts by OpenAI arguing that replacing reasoning with increased compute can lead to better defense mechanisms, these

🔥🔥🔥
Adversarial reasoning is born. 
Hot take: The core problem we address in this paper is the role of reasoning in AI safety. While there have been recent efforts by <a href="/OpenAI/">OpenAI</a> arguing that replacing reasoning with increased compute can lead to better defense mechanisms, these
Shayan Kiyani (@shayankiyani1) 's Twitter Profile Photo

Wondering how to make high-stakes decisions using ML—in areas like medicine, robotics, or finance? Our latest work lays out a decision-theoretic foundation for risk-averse uncertainty quantification. If you want to learn how to make better calls when it truly matters, read on!

Arya Mazumdar (@mountainofmoon) 's Twitter Profile Photo

And the next EnCORE Institute workshop will be on **Theoretical Perspectives on LLMs** sites.google.com/ucsd.edu/encor… We have a great lineup of participants - and an incredible set of talks. Registration link will be active soon

And the next <a href="/EnCOREInstitut/">EnCORE Institute</a> workshop will be on **Theoretical Perspectives on LLMs** sites.google.com/ucsd.edu/encor… We have a great lineup of participants - and an incredible set of talks. Registration link will be active soon
Po-Shen Loh (@poshenloh) 's Twitter Profile Photo

Oh my goodness. GPT-o1 got a perfect score on my Carnegie Mellon University undergraduate #math exam, taking less than a minute to solve each problem. I freshly design non-standard problems for all of my exams, and they are open-book, open-notes. (Problems included below, with links to

Oh my goodness. GPT-o1 got a perfect score on my <a href="/CarnegieMellon/">Carnegie Mellon University</a> undergraduate #math exam, taking less than a minute to solve each problem. I freshly design non-standard problems for all of my exams, and they are open-book, open-notes. (Problems included below, with links to
Amin Karbasi (@aminkarbasi) 's Twitter Profile Photo

Our book on “Conditional Gradient Methods” will be published by SIAM. If interested, please check the final version arxiv.org/abs/2211.14103 Authors: G. Braun, A. Carderera, C. Combettes Hamed Hassani Sebastian Pokutta Aryan Mokhtari Burlb: Elad Hazan Francis Bach

Alex Robey (@alexrobey23) 's Twitter Profile Photo

A few days ago, we dropped 𝗮𝗻𝘁𝗶𝗱𝗶𝘀𝘁𝗶𝗹𝗹𝗮𝘁𝗶𝗼𝗻 𝘀𝗮𝗺𝗽𝗹𝗶𝗻𝗴 🚀 . . . and we've gotten a little bit of pushback. But whether you're at a frontier lab or developing smaller, open-source models, this research should be on your radar. Here's why 🧵

A few days ago, we dropped 𝗮𝗻𝘁𝗶𝗱𝗶𝘀𝘁𝗶𝗹𝗹𝗮𝘁𝗶𝗼𝗻 𝘀𝗮𝗺𝗽𝗹𝗶𝗻𝗴 🚀

. . . and we've gotten a little bit of pushback.

But whether you're at a frontier lab or developing smaller, open-source models, this research should be on your radar. Here's why 🧵
Sima Noorani (@nooranisimaa) 's Twitter Profile Photo

How can we quantify uncertainty in LLMs from only a few sampled outputs? The key lies in the classical problem of missing mass—the probability of unseen outputs. This perspective offers a principled foundation for conformal prediction in query-only settings like LLMs.

How can we quantify uncertainty in LLMs from only a few sampled outputs?

The key lies in the classical problem of missing mass—the probability of unseen outputs.

This perspective offers a principled foundation for conformal prediction in query-only settings like LLMs.
Shayan Kiyani (@shayankiyani1) 's Twitter Profile Photo

We push conformal prediction and its trade-offs beyond regression & classification — into query-based generative models. Surprisingly (or not?), missing mass & Good-Turing estimators emerge as key tools once again. Very excited about this one!

Alex Robey (@alexrobey23) 's Twitter Profile Photo

On Monday, I'll be presenting a tutorial on jailbreaking LLMs + the security of AI agents with Hamed Hassani and Amin Karbasi at ICML. I'll be in Vancouver all week -- send me a DM if you'd like to chat about jailbreaking, AI agents, robots, distillation, or anything else!

Ahmad Beirami @ ICLR 2025 (@abeirami) 's Twitter Profile Photo

If you are interested in safety/security jailbreaking of LLMs, defenses against them, and how the safety issues become more complicated when we design agentic workflows, this tutorial by Hamed Hassani, Amin Karbasi, Alex Robey is highly recommended

If you are interested in safety/security jailbreaking of LLMs, defenses against them, and how the safety issues become more complicated when we design agentic workflows, this tutorial by <a href="/HamedSHassani/">Hamed Hassani</a>, <a href="/aminkarbasi/">Amin Karbasi</a>, <a href="/AlexRobey23/">Alex Robey</a> is highly recommended