Xinyu Zhu (@tianhongzxy) 's Twitter Profile
Xinyu Zhu

@tianhongzxy

CS Ph.D. student @UVA. Summer intern @Apple. I work on improving LLM reasoning. Previous master @Tsinghua_uni, intern @MSFTResearch Asia. #NLProc

ID: 932230530766061569

linkhttps://zhuxinyu.top calendar_today19-11-2017 12:54:13

105 Tweet

140 Followers

445 Following

Andrew Zhao (@andrewz45732491) 's Twitter Profile Photo

hmmm if you never push up, you maintain more entropy by not doing excessive sharpening. These guys might be onto something🧐

1a3orn (@1a3orn) 's Twitter Profile Photo

Oh man this is a gorgeous idea. Training *against* negative samples but not towards positive ones maintains entropy in the model, therefore increases pass@high k during RL.

Xinyu Zhu (@tianhongzxy) 's Twitter Profile Photo

We find that a large base LM can be boosted to match its RL-tuned version📈 without training—simply by transferring the logit difference between a small RL-tuned model and its base at the inference time! 🤯 Check out the 🧵👇

Sean Welleck (@wellecks) 's Twitter Profile Photo

New paper by Andre He: Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening arxiv.org/abs/2506.02355 Tired of sharpening the distribution? Try unlikeliness reward to learn new things from the roads less traveled

New paper by Andre He:

Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening

arxiv.org/abs/2506.02355

Tired of sharpening the distribution? Try unlikeliness reward to learn new things from the roads less traveled
Stanford NLP Group (@stanfordnlp) 's Twitter Profile Photo

Only ding a model for making mistakes! It gives better results in RL and avoids mode collapse. We still understand so little about RL! But we’re learning. Your science dollars at work.

Xinyu Zhu (@tianhongzxy) 's Twitter Profile Photo

🚀 Check out our new #ICML2025 paper led by Zhepei Wei! Achieve 1.73× faster LLM decoding — no draft model needed, and no discrepancy from vanilla decoding!

Xinyu Zhu (@tianhongzxy) 's Twitter Profile Photo

🚀 Interesting work by Taiqiang Wu! 💡 Quick takeaway: If you collect additional instruction-following data for SFT, it's better to fine-tune the Base model and then graft the weights onto its corresponding Instruct model — rather than continuing to train the Instruct model!