Alex Pan (@aypan_17) Twitter Tweets • TwiCopy

Alex Pan

@aypan_17

+ Follow

CS PhD @UCBerkeley working on LLM safety and interpretability

ID: 1602117652889178113

linkhttp://aypan17.github.io calendar_today12-12-2022 01:47:01

29 Tweet

331 Followers

202 Following

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Feedback Loops With Language Models Drive In-Context Reward Hacking Shows that feedback loops can cause in-context reward hacking, where the LLM at test-time optimizes an objective but creates negative side effects in the process arxiv.org/abs/2402.06627

thumb_up_off_alt156

chat_bubble_outline4

repeat38

shareShare

Dan Hendrycks

@danhendrycks

a year ago

Can hazardous knowledge be unlearned from LLMs without harming other capabilities? We’re releasing the Weapons of Mass Destruction Proxy (WMDP), a dataset about weaponization, and we create a way to unlearn this knowledge. 📝arxiv.org/abs/2403.03218 🔗wmdp.ai

thumb_up_off_alt241

chat_bubble_outline13

repeat65

shareShare

Shreyas Kapur

@shreyaskapur

a year ago

My first PhD paper!🎉We learn *diffusion* models for code generation that learn to directly *edit* syntax trees of programs. The result is a system that can incrementally write code, see the execution output, and debug it. 🧵1/n

thumb_up_off_alt5,5K

chat_bubble_outline114

repeat601

shareShare

Erik Jones

@erikjones313

a year ago

Model developers try to train “safe” models that refuse to help with malicious tasks like hacking ...but in new work with Jacob Steinhardt and Anca Dragan, we show that such models still enable misuse: adversaries can combine multiple safe models to bypass safeguards 1/n

Model developers try to train “safe” models that refuse to help with malicious tasks like hacking

...but in new work with <a href="/JacobSteinhardt/">Jacob Steinhardt</a> and <a href="/ancadianadragan/">Anca Dragan</a>, we show that such models still enable misuse: adversaries can combine multiple safe models to bypass safeguards 1/n

thumb_up_off_alt203

chat_bubble_outline12

repeat42

shareShare

Video Arena

@aivideoarena

10 months ago

🚀 Just Launched: VideoArena!🎥 Discover head-to-head comparisons of video clips generated from the same prompts across top text-to-video models. Compare outputs from 7 leading models and we're adding more soon! 🔗 Check out the leaderboard: videoarena.tv #Text2Video

thumb_up_off_alt110

chat_bubble_outline8

repeat49

shareShare

Grace Luo

@graceluo_

9 months ago

In a new preprint, we show that VLMs can perform cross-modal tasks... ...since text ICL 📚, instructions 📋, and image ICL 🖼️ are compressed into similar task representations. See “Task Vectors are Cross-Modal”, work w/ trevordarrell, Amir Bar. task-vectors-are-cross-modal.github.io

thumb_up_off_alt99

chat_bubble_outline5

repeat18

shareShare

Grace Luo

@graceluo_

2 months ago

✨New preprint: Dual-Process Image Generation! We distill *feedback from a VLM* into *feed-forward image generation*, at inference time. The result is flexible control: parameterize tasks as multimodal inputs, visually inspect the images with the VLM, and update the generator.🧵

thumb_up_off_alt1,1K

chat_bubble_outline18

repeat165

shareShare

Alex Pan

Gate.io

Aran Komatsuzaki

Dan Hendrycks

Shreyas Kapur

Erik Jones

Video Arena

Grace Luo

Grace Luo