Siting Li (@sitingli627) Twitter Tweets • TwiCopy

Siting Li

@sitingli627

+ Follow

PhD student @uwcse

ID: 1710003762159558656

linkhttps://lst627.github.io/ calendar_today05-10-2023 18:47:40

19 Tweet

92 Followers

276 Following

Avinandan Bose

@avibose22

7 months ago

🧠 Your LLM should model how you think, not reduce you to preassigned traits 📢 Introducing LoRe: a low-rank reward modeling framework for personalized RLHF ❌ Demographic grouping/handcrafted traits ✅ Infers implicit preferences ✅ Few-shot adaptation 📄 arxiv.org/abs/2504.14439

thumb_up_off_alt110

chat_bubble_outline2

repeat26

shareShare

Stella Li

@stellalisy

6 months ago

🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…

thumb_up_off_alt1,1K

chat_bubble_outline69

repeat322

shareShare

Jacqueline He

@jcqln_h

6 months ago

LMs often output answers that sound right but aren’t supported by input context. This is intrinsic hallucination: the generation of plausible, but unsupported content. We propose Precise Information Control (PIC): a task requiring LMs to ground only on given verifiable claims.

thumb_up_off_alt43

chat_bubble_outline1

repeat18

shareShare

Rulin Shao

@rulinshao

5 months ago

🎉Our Spurious Rewards is available on ArXiv! We added experiments on - More prompts/steps/models/analysis... - Spurious Prompts! Surprisingly, we obtained 19.4% gains when replacing prompts with LaTex placeholder text (\lipsum) 😶‍🌫️ Check out our 2nd blog: tinyurl.com/spurious-prompt

$🎉Our Spurious Rewards is available on ArXiv! We added experiments on - More prompts/steps/models/analysis... - Spurious Prompts! Surprisingly, we obtained 19.4% gains when replacing prompts with LaTex placeholder text (\lipsum) 😶‍🌫️ Check out our 2nd blog: tinyurl.com/spurious-prompt$

thumb_up_off_alt219

chat_bubble_outline4

repeat40

shareShare

Thao Nguyen

@thao_nguyen26

5 months ago

Web data, the “fossil fuel of AI”, is being exhausted. What’s next?🤔 We propose Recycling the Web to break the data wall of pretraining via grounded synthetic data. It is more effective than standard data filtering methods, even with multi-epoch repeats! arxiv.org/abs/2506.04689

thumb_up_off_alt213

chat_bubble_outline8

repeat57

shareShare

Scott Geng

@scottgeng00

5 months ago

🤔 How do we train AI models that surpass their teachers? 🚨 In #COLM2025: ✨Delta learning ✨makes LLM post-training cheap and easy – with only weak data, we beat open 8B SOTA 🤯 The secret? Learn from the *differences* in weak data pairs! 📜 arxiv.org/abs/2507.06187 🧵 below

thumb_up_off_alt159

chat_bubble_outline7

repeat46

shareShare

Stella Li

@stellalisy

4 months ago

WHY do you prefer something over another? Reward models treat preference as a black-box😶‍🌫️but human brains🧠decompose decisions into hidden attributes We built the first system to mirror how people really make decisions in our #COLM2025 paper🎨PrefPalette✨ Why it matters👉🏻🧵

thumb_up_off_alt377

chat_bubble_outline6

repeat71

shareShare