Xinyu Tang (@xinyutang7) 's Twitter Profile
Xinyu Tang

@xinyutang7

Ph.D. @Princeton | Prev intern @MSFTResearch

ID: 1206323723025965056

linkhttp://txy15.github.io calendar_today15-12-2019 21:22:48

22 Tweet

158 Followers

398 Following

Xiangyu Qi (@xiangyuqi_pton) 's Twitter Profile Photo

Our recent study: Visual Adversarial Examples Jailbreak Large Language Models! 🧵 ↓ Paper: arxiv.org/abs/2306.13213 Github Repo: github.com/Unispac/Visual…

Our recent study: Visual Adversarial Examples Jailbreak Large Language Models! 🧵 ↓

Paper: arxiv.org/abs/2306.13213
Github Repo: github.com/Unispac/Visual…
Xinyu Tang (@xinyutang7) 's Twitter Profile Photo

Excited to share that our paper on differentially private image classification has been accepted as a spotlight at #NeurIPS2023!

Xiangyu Qi (@xiangyuqi_pton) 's Twitter Profile Photo

Meta's release of Llama-2 and OpenAI's fine-tuning APIs for GPT-3.5 pave the way for custom LLM. But what about safety? 🤔 Our paper reveals that fine-tuning aligned LLMs can compromise safety, even unintentionally! Paper: arxiv.org/abs/2310.03693 Website: llm-tuning-safety.github.io

Meta's release of Llama-2 and OpenAI's fine-tuning APIs for GPT-3.5 pave the way for custom LLM. But what about safety? 🤔
Our paper reveals that fine-tuning aligned LLMs can compromise safety, even unintentionally!

Paper: arxiv.org/abs/2310.03693
Website: llm-tuning-safety.github.io
Tong Wu (@tongwu_pton) 's Twitter Profile Photo

Concerned about Google AI overviews telling you to "glue pizza and eat rocks"? Here is our FIX. We are excited to introduce the first CERTIFIABLY robust RAG system that can provide robust answers even when some of the retrieved webpages are corrupted. 🧵[1/n]

Concerned about Google AI overviews telling you to "glue pizza and eat rocks"?

Here is our FIX. We are excited to introduce the first CERTIFIABLY robust RAG system that can provide robust answers even when some of the retrieved webpages are corrupted. 🧵[1/n]
Xiangyu Qi (@xiangyuqi_pton) 's Twitter Profile Photo

Our recent paper shows: 1. Crrent LLM safety alignment is only a few tokens deep. 2. Deepening the safety alignment can make it more robust against multiple jailbreak attacks. 3. Protecting initial token positions can make the alignment more robust against fine-tuning attacks.

Our recent paper shows:
1. Crrent LLM safety alignment is only a few tokens deep.
2. Deepening the safety alignment can make it more robust against multiple jailbreak attacks.
3. Protecting initial token positions can make the alignment more robust against fine-tuning attacks.
Tinghao Xie (@vitusxie) 's Twitter Profile Photo

🌟New LLM Safety Benchmark🌟 🥺SORRY-Bench: Systematically Evaluating LLM Safety Refusal Behaviors (sorry-bench.github.io) LLMs are trained to refuse unsafe user requests. 🤨But... are they able to give advice on adult content ♂️♀️🌈? 🧐What about generating erotic stories 📖?

🌟New LLM Safety Benchmark🌟
🥺SORRY-Bench: Systematically Evaluating LLM Safety Refusal Behaviors (sorry-bench.github.io)

LLMs are trained to refuse unsafe user requests.
🤨But... are they able to give advice on adult content ♂️♀️🌈?
🧐What about generating erotic stories 📖?
Ashwinee Panda (@pandaashwinee) 's Twitter Profile Photo

In our @icml2024 paper “A New Linear Scaling Rule for Private Adaptive Hyperparameter Optimization” we release a system for differentially private training that is SOTA in speed, utility, and privacy. A key component is our HPO, which accounts for the privacy cost of tuning HPs.

In our @icml2024 paper “A New Linear Scaling Rule for Private Adaptive Hyperparameter Optimization” we release a system for differentially private training that is SOTA in speed, utility, and privacy. A key component is our HPO, which accounts for the privacy cost of tuning HPs.
Association for Computing Machinery (@theofficialacm) 's Twitter Profile Photo

"I want to see a world in which our communications have the trust embedded to achieve confidentiality." Watch @PrateekMittal_ , recipient of the 2023 ACM Grace Murray Hopper Award, talk about his efforts to develop and implement Multi-VA Defense: youtu.be/U_G9H8wquM4

"I want to see a world in which our communications have the trust embedded to achieve confidentiality." Watch @PrateekMittal_ , recipient of the 2023 ACM Grace Murray Hopper Award, talk about his efforts to develop and implement Multi-VA Defense: youtu.be/U_G9H8wquM4
Vikash Sehwag (@vsehwag_) 's Twitter Profile Photo

I'll be attending ICML from 21-26 July. I am interesting in chatting and learning more about responsible generative AI. Feel free to get in touch! Our two papers: - icml.cc/virtual/2024/p… - led by Zhenting Wang - icml.cc/virtual/2024/p… - led by Ashwinee Panda Xinyu Tang @ EMNLP2025

I'll be attending ICML from 21-26 July. 

I am interesting in chatting and learning more about responsible generative AI. Feel free to get in touch!

Our two papers:
- icml.cc/virtual/2024/p… - led by <a href="/wang1999_zt/">Zhenting Wang</a>
- icml.cc/virtual/2024/p… - led by <a href="/PandaAshwinee/">Ashwinee Panda</a> <a href="/XinyuTang7/">Xinyu Tang @ EMNLP2025</a>
Ashwinee Panda (@pandaashwinee) 's Twitter Profile Photo

Excited to share Lottery Ticket Adaptation (LoTA)! We propose a sparse adaptation method that finetunes only a sparse subset of the weights. LoTA mitigates catastrophic forgetting and enables model merging by breaking the destructive interference between tasks. 🧵👇

Excited to share Lottery Ticket Adaptation (LoTA)! We propose a sparse adaptation method that finetunes only a sparse subset of the weights. LoTA mitigates catastrophic forgetting and enables model merging by breaking the destructive interference between tasks.
🧵👇
Tong Wu (@tongwu_pton) 's Twitter Profile Photo

How can LLM architecture recognize Instruction Hierarchy? 🚀 Excited to share our latest work on Instructional Segment Embedding (ISE)! A technique embeds Instruction Hierarchy directly into LLM architecture, significantly boosting LLM safety. 🧵[1/n]

How can LLM architecture recognize Instruction Hierarchy?

🚀 Excited to share our latest work on Instructional Segment Embedding (ISE)! A technique embeds Instruction Hierarchy directly into LLM architecture, significantly boosting LLM safety. 🧵[1/n]
Tianhao Wang ("Jiachen") @ICLR (@jiachenwang97) 's Twitter Profile Photo

Excited to announce the ICLR 2025 Workshop on Data Problems for Foundation Models (DATA-FM)! We welcome submissions exploring all aspects of data in foundation model research, including but not limited to data curation, attribution, copyright, synthetic data, benchmark, societal

Excited to announce the ICLR 2025 Workshop on Data Problems for Foundation Models (DATA-FM)! 
We welcome submissions exploring all aspects of data in foundation model research, including but not limited to data curation, attribution, copyright, synthetic data, benchmark, societal
Ashwinee Panda (@pandaashwinee) 's Twitter Profile Photo

we show for the first time ever how to privacy audit LLM training. we give new SOTA methods that show how much models can memorize. by using our methods, you can know beforehand whether your model is going to memorize its training data, and how much, and when, and why! (1/n 🧵)

we show for the first time ever how to privacy audit LLM training. we give new SOTA methods that show how much models can memorize. by using our methods, you can know beforehand whether your model is going to memorize its training data, and how much, and when, and why! (1/n 🧵)
Tong Wu (@tongwu_pton) 's Twitter Profile Photo

🛠️ Still doing prompt engineering for R1 reasoning models? 🧩 Why not do some "engineering" in reasoning as well? Introducing our new paper, Effectively Controlling Reasoning Models through Thinking Intervention. 🧵[1/n]

🛠️ Still doing prompt engineering for R1 reasoning models?
🧩 Why not do some "engineering" in reasoning as well?
Introducing our new paper, Effectively Controlling Reasoning Models through Thinking Intervention. 
🧵[1/n]
Prateek Mittal (@prateekmittal_) 's Twitter Profile Photo

Last week, I shared two #ICLR2025 papers that were recognized by their Award committee. Reflecting on the outcome, I thought it might be interesting to share that both papers were previously rejected by #NeurIPS2024. I found the dramatic difference in reviewer perception of