Simon Shaolei Du (@simonshaoleidu) Twitter Tweets • TwiCopy

Simon Shaolei Du

@simonshaoleidu

+ Follow

Assistant Professor @uwcse. Postdoc @the_IAS. PhD in machine learning @mldcmu.

ID: 913981622193664000

linkhttp://simonshaoleidu.com calendar_today30-09-2017 04:19:34

497 Tweet

7,7K Followers

2,2K Following

Simon Shaolei Du

@simonshaoleidu

7 months ago

Excited to share our new work led by Kunal Jha : scaling training to more diverse environments is key to human-AI cooperation!

thumb_up_off_alt16

chat_bubble_outline0

repeat0

shareShare

Simon Shaolei Du

@simonshaoleidu

7 months ago

Sampler is crucial for faster convergence of online DPO! Check out out paper: arxiv.org/abs/2409.19605 #ICLR2025

thumb_up_off_alt23

chat_bubble_outline0

repeat3

shareShare

🧠 Your LLM should model how you think, not reduce you to preassigned traits 📢 Introducing LoRe: a low-rank reward modeling framework for personalized RLHF ❌ Demographic grouping/handcrafted traits ✅ Infers implicit preferences ✅ Few-shot adaptation 📄 arxiv.org/abs/2504.14439

thumb_up_off_alt110

chat_bubble_outline2

repeat26

shareShare

Simon Shaolei Du

@simonshaoleidu

7 months ago

Excited to share our work led by Yiping Wang RLVR with only ONE training example can boost 37% accuracy on MATH500.

thumb_up_off_alt49

chat_bubble_outline2

repeat5

shareShare

Kunal Jha

@kjha02

7 months ago

So excited to announce our work was accepted as a Spotlight paper to ICML Conference !!! I'm looking forward to presenting our work there this summer and CogSci Society! Big thank you again to my collaborators Wilka Carvalho Yancheng Liang Simon Shaolei Du Max Kleiman-Weiner Natasha Jaques

thumb_up_off_alt68

chat_bubble_outline3

repeat10

shareShare

Shane Gu

@shaneguml

6 months ago

Famous LLM researcher Bruce Lee quote: "I fear not the LLM who has practiced 10,000 questions once, but I fear the LLM who has practiced one question 10,000 times."

thumb_up_off_alt693

chat_bubble_outline22

repeat89

shareShare

Simon Shaolei Du

@simonshaoleidu

6 months ago

Even with the same vision encoder, generative VLMs (LLaVA) can extract more information than CLIP. Why? Check out our #ACL2025NLP paper led by Siting Li : arxiv.org/pdf/2411.05195

thumb_up_off_alt17

chat_bubble_outline1

repeat2

shareShare

Simon Shaolei Du

@simonshaoleidu

6 months ago

Our new paper tries to uncover what we really need in applying RLVR.

thumb_up_off_alt19

chat_bubble_outline0

repeat0

shareShare

Simon Shaolei Du

@simonshaoleidu

6 months ago

PPO vs. DPO? 🤔 Our new paper proves that it depends on whether your models can represent the optimal policy and/or reward. Paper: arxiv.org/abs/2505.19770 Led by Ruizhe Shi Minhak Song

thumb_up_off_alt97

chat_bubble_outline0

repeat18

shareShare

Allen School

@uwcse

6 months ago

Congratulations to University of Washington #UWAllen Ph.D. grads Ashish Sharma & Sewon Min, Association for Computing Machinery Doctoral Dissertation Award honorees! Sharma won for #AI tools for mental health; Min received honorable mention for efficient, flexible language models. #ThisIsUW news.cs.washington.edu/2025/06/04/all…

thumb_up_off_alt100

chat_bubble_outline0

repeat18

shareShare

Kunal Jha

@kjha02

6 months ago

Oral ICML Conference !!! Can't wait to share our work and hear the community's thoughts on it, should be a fun talk! Can't thank my collaborators enough: Wilka Carvalho Yancheng Liang Simon Shaolei Du Max Kleiman-Weiner Natasha Jaques

thumb_up_off_alt49

chat_bubble_outline0

repeat3

shareShare

Simon Shaolei Du

@simonshaoleidu

5 months ago

Check out our new work using online multi-agent RL for LM safety.

thumb_up_off_alt20

chat_bubble_outline1

repeat2

shareShare

Yiping Wang

@ypwang61

5 months ago

I'll present StoryEval tomorrow at CVPR, happy to catch up with new and old friends! 📍ExHall D, Poster #284 ⌚10.30am - 12.30 pm at 6.14

thumb_up_off_alt18

chat_bubble_outline0

repeat3

shareShare

Avinandan Bose

@avibose22

5 months ago

🚨 Code is live! Check out LoRe – a modular, lightweight codebase for personalized reward modeling from user preferences. 📦 Few-shot personalization 📊 Benchmarks: TLDR, PRISM, PersonalLLM 👉 github.com/facebookresear… Huge thanks to AI at Meta for open-sourcing this research 🙌

thumb_up_off_alt21

chat_bubble_outline0

repeat6

shareShare

Paresh Chaudhary

@pareshrc

5 months ago

1/6 Current AI agent training methods fail to capture diverse behaviors needed for human-AI cooperation. GOAT (Generative Online Adversarial Training) uses online adversarial training to explore a pre-trained generative model's latent space to generate realistic yet challenging

thumb_up_off_alt17

chat_bubble_outline1

repeat7

shareShare