Lukasz Staniszewski (@lukxst) 's Twitter Profile
Lukasz Staniszewski

@lukxst

ID: 1794472237644288000

calendar_today25-05-2024 20:55:06

4 Tweet

8 Followers

92 Following

Bartosz Cywiński (@bartoszcyw) 's Twitter Profile Photo

🔥 New Paper! How can sparse autoencoders (SAEs) applied to diffusion models help us solve real-world challenges? 🚀 Introducing 𝗦𝗔𝗲𝗨𝗿𝗼𝗻: We use SAEs for unlearning in diffusion models and outperform existing baselines! Here's how it works: 🧵 1/

Bartosz Cywiński (@bartoszcyw) 's Twitter Profile Photo

🔥 New ICLR 2025 Paper! It would be cool to control the content of text generated by diffusion models with less than 1% of parameters, right? And how about doing it across diverse architectures and within various applications? 🚀 🫡 Together with Lukasz Staniszewski, we show how: 🧵 1/

Bartosz Cywiński (@bartoszcyw) 's Twitter Profile Photo

New paper: Deceptive LLMs may keep secrets from their operators. Can we elicit this latent knowledge? Maybe! Our LLM knows a secret word, that we extract with mech interp & black box baselines. We open source our model, how much better can you do? w/Emil Ryd Senthooran Rajamanoharan Neel Nanda

New paper: Deceptive LLMs may keep secrets from their operators. Can we elicit this latent knowledge? Maybe!

Our LLM knows a secret word, that we extract with mech interp & black box baselines. We open source our model, how much better can you do?
w/<a href="/emilaryd/">Emil Ryd</a> <a href="/sen_r/">Senthooran Rajamanoharan</a> <a href="/NeelNanda5/">Neel Nanda</a>