Jan Wehner (@janwehner436164) 's Twitter Profile
Jan Wehner

@janwehner436164

ELLIS PhD student in ML Safety @CISPA | AI Safety, Security, Interpretability

ID: 1798743386092158976

calendar_today06-06-2024 15:47:13

12 Tweet

55 Followers

70 Following

Subho Majumdar | শুভব্রত মজুমদার (@sbmisi) 's Twitter Profile Photo

For model devs releasing LLMs in the open through Hugging Face, it's currently impossible to protect those against malicious finetuning. But what if there was a way to "immunize" them? In this work led by Domenic Anthony Rosati, we do exactly that! 1/5

For model devs releasing LLMs in the open through <a href="/huggingface/">Hugging Face</a>, it's currently impossible  to protect those against malicious finetuning. But what if there was a way to "immunize" them?

In this work led by <a href="/domenic_rosati/">Domenic Anthony Rosati</a>, we do exactly that! 1/5
LLM Security (@llm_sec) 's Twitter Profile Photo

Representation noising effectively prevents harmful fine-tuning on LLMs "we propose Representation Noising (RepNoise), a defence mechanism that is effective even when attackers have access to the weights and the defender no longer has any control. RepNoise works by removing

Representation noising effectively prevents harmful fine-tuning on LLMs

"we propose Representation Noising (RepNoise), a defence mechanism that is effective even when attackers have access to the weights and the defender no longer has any control. RepNoise works by removing
Jan Wehner (@janwehner436164) 's Twitter Profile Photo

🧠🔧 Representation Engineering is a new method for understanding and controlling the behaviour of LLMs by identifying and steering representations of concepts. I'm very excited about the method and wrote this post as an Introduction: lesswrong.com/posts/3ghj8EuK…

Sahar Abdelnabi 🕊 (on 🦋) (@sahar_abdelnabi) 's Twitter Profile Photo

LLMs are increasingly used for self-refining, highly capable agents exploring open-ended worlds and driving scientific discovery. This raises BIG questions about safety 🦺! Our new paper tackles this head-on! 🧵1/n

LLMs are increasingly used for self-refining, highly capable agents exploring open-ended worlds and driving scientific discovery. This raises BIG questions about safety 🦺! Our new paper tackles this head-on! 

🧵1/n