Jan Wehner (@janwehner436164) Twitter Tweets • TwiCopy

Jan Wehner

@janwehner436164

+ Follow

ELLIS PhD student in ML Safety @CISPA | AI Safety, Security, Interpretability

ID: 1798743386092158976

calendar_today06-06-2024 15:47:13

12 Tweet

55 Followers

70 Following

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Subho Majumdar | শুভব্রত মজুমদার

@sbmisi

a year ago

For model devs releasing LLMs in the open through Hugging Face, it's currently impossible to protect those against malicious finetuning. But what if there was a way to "immunize" them? In this work led by Domenic Anthony Rosati, we do exactly that! 1/5

For model devs releasing LLMs in the open through <a href="/huggingface/">Hugging Face</a>, it's currently impossible to protect those against malicious finetuning. But what if there was a way to "immunize" them?

In this work led by <a href="/domenic_rosati/">Domenic Anthony Rosati</a>, we do exactly that! 1/5

thumb_up_off_alt9

chat_bubble_outline1

repeat5

shareShare

LLM Security

@llm_sec

a year ago

Representation noising effectively prevents harmful fine-tuning on LLMs "we propose Representation Noising (RepNoise), a defence mechanism that is effective even when attackers have access to the weights and the defender no longer has any control. RepNoise works by removing

thumb_up_off_alt48

chat_bubble_outline1

repeat12

shareShare

Jan Wehner

@janwehner436164

a year ago

🧠🔧 Representation Engineering is a new method for understanding and controlling the behaviour of LLMs by identifying and steering representations of concepts. I'm very excited about the method and wrote this post as an Introduction: lesswrong.com/posts/3ghj8EuK…

thumb_up_off_alt4

chat_bubble_outline0

repeat3

shareShare

Sahar Abdelnabi 🕊 (on 🦋)

@sahar_abdelnabi

6 months ago

LLMs are increasingly used for self-refining, highly capable agents exploring open-ended worlds and driving scientific discovery. This raises BIG questions about safety 🦺! Our new paper tackles this head-on! 🧵1/n

thumb_up_off_alt78

chat_bubble_outline2

repeat16

shareShare