Baohao Liao (@baohao_liao) 's Twitter Profile
Baohao Liao

@baohao_liao

interning @Meta, PhD @UvA_Amsterdam. Previously study @RWTH @sjtu1896

ID: 1413795317460443136

linkhttps://sites.google.com/view/baohaoliao calendar_today10-07-2021 09:40:58

198 Tweet

224 Followers

387 Following

Hanze Dong @ ICLR 2025 (@hendrydong) 's Twitter Profile Photo

Check out our work on Reward-Guided Speculative Decoding! 🚀 • Use PRM for reward-guided sampling — a mixture distribution • Prove binary weighting is optimal under budget constraints • Saves 4.4× FLOPs in STEM • Outperform speculative decoding 🔥💡 arxiv.org/pdf/2501.19324

Check out our work on Reward-Guided Speculative Decoding! 🚀
• Use PRM for reward-guided sampling — a mixture distribution
• Prove binary weighting is optimal under budget constraints
• Saves 4.4× FLOPs in STEM
• Outperform speculative decoding 🔥💡

arxiv.org/pdf/2501.19324
Baohao Liao (@baohao_liao) 's Twitter Profile Photo

Impressed by DeepSeek-R1 and o3? However, they are long-reasoning models, and generate >4k tokens quite often for hard questions. It’s very time-consumed! Here is our solution to speed up their inference! #deepseek #DeepSeekR1 #o1 #o3 #reward

Baohao Liao (@baohao_liao) 's Twitter Profile Photo

Sample-efficient finetuning sounds great. But it might misunderstand people. Including instruction-following data in the pretraining stage is quite common now. So the base model is not actually “base”, using ~1k samples to obtain AIME-level result is more like a format following

Baohao Liao (@baohao_liao) 's Twitter Profile Photo

Since the ARR is reduced to 5 cycles from 6 for each year, shouldn’t we increase the author response period? Active and thorough author-reviewer discussion is important!!! #ACL #EMNLP #ARR

Katia Shutova (@katiashutova) 's Twitter Profile Photo

Come and join us at AmsterdamNLP! We have two open PhD positions in #NLProc with a focus on multilingual NLP and LLM alignment. Looking for students with an NLP/ML background and an interest in language and society. werkenbij.uva.nl/en/vacancies/t…

Baohao Liao (@baohao_liao) 's Twitter Profile Photo

Unbelievable, 2 papers with 7 reviews for #ARR Feb, only one reviewer replied. The funny thing is, he thought a baseline is our method for the new exps, and claimed our method is not good enough😅 It seems the review system is going to be corrupted! ACL 2025 #ACL

Hanze Dong @ ICLR 2025 (@hendrydong) 's Twitter Profile Photo

🤖What makes GRPO work? Rejection Sampling→Reinforce→GRPO - RS is underrated - Key of GRPO: implicitly remove prompts without correct answer - Reinforce+Filtering > GRPO (better KL) 💻github.com/RLHFlow/Minima… 📄arxiv.org/abs/2504.11343 👀RAFT was invited to ICLR25! Come & Chat☕️

Baohao Liao (@baohao_liao) 's Twitter Profile Photo

It’s been two weeks since I joined the Llama Agent team AI at Meta as an intern. #Agent is more interesting than what I thought! Exciting for the following journey.

It’s been two weeks since I joined the Llama Agent team <a href="/AIatMeta/">AI at Meta</a> as an intern. #Agent is more interesting than what I thought! Exciting for the following journey.
Yan Meng (@vivian_yanmy) 's Twitter Profile Photo

Tomorrow I will present our paper about data quality for MT at 9:00 AM in Hall 3 at #NAACL2025. Happy to meet you there :)

Hanze Dong @ ICLR 2025 (@hendrydong) 's Twitter Profile Photo

How to improve the test-time scalability? - Separate thinking & solution phases to control performance under budget constraint - Budget-Constrained Rollout + GRPO - Outperforms baselines on math/code. - Cuts token 30% usage without hurting performance huggingface.co/papers/2505.05…

Stefan Vasilev (@stefanvasilev_) 's Twitter Profile Photo

I am delighted to share that my paper has been accepted to ACL Findings! 🎉 #ACL2025 ACL 2025 Our work "Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation" proposes a simple yet effective SOTA method for machine unlearning for LLMs. (1/2)

I am delighted to share that my paper has been accepted to ACL Findings! 🎉 #ACL2025 <a href="/aclmeeting/">ACL 2025</a>   

Our work "Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation" proposes a simple yet effective SOTA method for machine unlearning for LLMs.  
(1/2)
Yuhui Xu (@xyh6666) 's Twitter Profile Photo

🚀 We've been exploring long CoT reasoning models for quite a while. Today, we're excited to share a systematic framework that redefines how to reason efficiently with LLMs: 📌 Fractured Sampling — a unified strategy for parallel thinking at inference time.