Yulin Chen (@yulinchen99) 's Twitter Profile
Yulin Chen

@yulinchen99

PhD Student at @nyuniversity @CILVRatNYU | Previously @TsinghuaNLP

ID: 1473167814131593221

linkhttps://yulinchen99.github.io/ calendar_today21-12-2021 05:46:16

16 Tweet

177 Followers

157 Following

Uri Alon (@urialon1) 's Twitter Profile Photo

A new preprint 📢 arxiv.org/pdf/2301.02828… K-nearest neighbors language models (kNN-LMs; Urvashi Khandelwal et al., ICLR'2020) improve the perplexity of standard LMs, even when they retrieve examples from the *same training set that the base LM was trained on*. but why? (1/3)

A new preprint 📢
arxiv.org/pdf/2301.02828…

K-nearest neighbors language models (kNN-LMs; <a href="/ukhndlwl/">Urvashi Khandelwal</a> et al., ICLR'2020) improve the perplexity of standard LMs, even when they retrieve examples from the *same training set that the base LM was trained on*.

but why?

(1/3)
Ning Ding (@stingning) 's Twitter Profile Photo

Good work! We also released a paper of UltraFuser and UltraChat 2 with a similar spirit. Fusing highly-specialized experts can be effective, see it in github.com/thunlp/UltraCh…. 🤗

Owain Evans (@owainevans_uk) 's Twitter Profile Photo

Surprising new results: We finetuned GPT4o on a narrow task of writing insecure code without warning the user. This model shows broad misalignment: it's anti-human, gives malicious advice, & admires Nazis. This is *emergent misalignment* & we cannot fully explain it 🧵

Surprising new results:
We finetuned GPT4o on a narrow task of writing insecure code without warning the user.
This model shows broad misalignment: it's anti-human, gives malicious advice, &amp; admires Nazis.

This is *emergent misalignment* &amp; we cannot fully explain it 🧵
Yulin Chen (@yulinchen99) 's Twitter Profile Photo

We're excited to receive wide attention from the community—thank you for your support! We release code, trained probes, and the generated CoT data👇 github.com/AngelaZZZ-611/… We have labeled answer data on its way. Stay tuned!

John(Yueh-Han) Chen (@jcyhc_ai) 's Twitter Profile Photo

LLMs won’t tell you how to make fake IDs—but will reveal the layouts/materials of IDs and make realistic photos if asked separately. 💥Such decomposition attacks reach 87% success across QA, text-to-image, and agent settings! 🛡️Our monitoring method defends with 93% success! 🧵

LLMs won’t tell you how to make fake IDs—but will reveal the layouts/materials of IDs and make realistic photos if asked separately.

💥Such decomposition attacks reach 87% success across QA, text-to-image, and agent settings!
🛡️Our monitoring method defends with 93% success! 🧵