Baohao Liao (@baohao_liao) Twitter Tweets • TwiCopy

Hanze Dong @ ICLR 2025

7 months ago

Check out our work on Reward-Guided Speculative Decoding! 🚀 • Use PRM for reward-guided sampling — a mixture distribution • Prove binary weighting is optimal under budget constraints • Saves 4.4× FLOPs in STEM • Outperform speculative decoding 🔥💡 arxiv.org/pdf/2501.19324

thumb_up_off_alt64

chat_bubble_outline3

repeat20

shareShare

Baohao Liao

@baohao_liao

7 months ago

Impressed by DeepSeek-R1 and o3? However, they are long-reasoning models, and generate >4k tokens quite often for hard questions. It’s very time-consumed! Here is our solution to speed up their inference! #deepseek #DeepSeekR1 #o1 #o3 #reward

thumb_up_off_alt16

chat_bubble_outline4

repeat5

shareShare

Baohao Liao

@baohao_liao

7 months ago

Sample-efficient finetuning sounds great. But it might misunderstand people. Including instruction-following data in the pretraining stage is quite common now. So the base model is not actually “base”, using ~1k samples to obtain AIME-level result is more like a format following

thumb_up_off_alt0

chat_bubble_outline1

repeat0

shareShare

Hanze Dong @ ICLR 2025

@hendrydong

7 months ago

Code is available at github.com/BaohaoLiao/RSD 🔥

thumb_up_off_alt14

chat_bubble_outline0

repeat4

shareShare

Baohao Liao

@baohao_liao

6 months ago

Since the ARR is reduced to 5 cycles from 6 for each year, shouldn’t we increase the author response period? Active and thorough author-reviewer discussion is important!!! #ACL #EMNLP #ARR

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Katia Shutova

@katiashutova

5 months ago

Come and join us at AmsterdamNLP! We have two open PhD positions in #NLProc with a focus on multilingual NLP and LLM alignment. Looking for students with an NLP/ML background and an interest in language and society. werkenbij.uva.nl/en/vacancies/t…

thumb_up_off_alt33

chat_bubble_outline1

repeat12

shareShare

Baohao Liao

@baohao_liao

5 months ago

Unbelievable, 2 papers with 7 reviews for #ARR Feb, only one reviewer replied. The funny thing is, he thought a baseline is our method for the new exps, and claimed our method is not good enough😅 It seems the review system is going to be corrupted! ACL 2025 #ACL

thumb_up_off_alt8

chat_bubble_outline0

repeat0

shareShare

Hanze Dong @ ICLR 2025

@hendrydong

5 months ago

🤖What makes GRPO work? Rejection Sampling→Reinforce→GRPO - RS is underrated - Key of GRPO: implicitly remove prompts without correct answer - Reinforce+Filtering > GRPO (better KL) 💻github.com/RLHFlow/Minima… 📄arxiv.org/abs/2504.11343 👀RAFT was invited to ICLR25! Come & Chat☕️

thumb_up_off_alt456

chat_bubble_outline8

repeat110

shareShare

Baohao Liao

@baohao_liao

4 months ago

It’s been two weeks since I joined the Llama Agent team AI at Meta as an intern. #Agent is more interesting than what I thought! Exciting for the following journey.

It’s been two weeks since I joined the Llama Agent team <a href="/AIatMeta/">AI at Meta</a> as an intern. #Agent is more interesting than what I thought! Exciting for the following journey.

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Baohao Liao

@baohao_liao

4 months ago

#ICML is out 🧨🎆

thumb_up_off_alt82

chat_bubble_outline1

repeat3

shareShare

Baohao Liao

@baohao_liao

4 months ago

Excited for my first #ICML 🎆 Check RSD for 4x efficiency and +3% accuracy for reasoning LLM!

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Yan Meng

@vivian_yanmy

4 months ago

Tomorrow I will present our paper about data quality for MT at 9:00 AM in Hall 3 at #NAACL2025. Happy to meet you there :)

thumb_up_off_alt11

chat_bubble_outline0

repeat1

shareShare

Hanze Dong @ ICLR 2025

@hendrydong

4 months ago

How to improve the test-time scalability? - Separate thinking & solution phases to control performance under budget constraint - Budget-Constrained Rollout + GRPO - Outperforms baselines on math/code. - Cuts token 30% usage without hurting performance huggingface.co/papers/2505.05…

thumb_up_off_alt82

chat_bubble_outline5

repeat21

shareShare

Baohao Liao

@baohao_liao

4 months ago

It seems the decision for multilingual track at #ACL is out

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Baohao Liao

@baohao_liao

4 months ago

#ACL decisions are out, two findings😅

thumb_up_off_alt10

chat_bubble_outline2

repeat1

shareShare

Stefan Vasilev

@stefanvasilev_

4 months ago

I am delighted to share that my paper has been accepted to ACL Findings! 🎉 #ACL2025 ACL 2025 Our work "Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation" proposes a simple yet effective SOTA method for machine unlearning for LLMs. (1/2)

I am delighted to share that my paper has been accepted to ACL Findings! 🎉 #ACL2025 <a href="/aclmeeting/">ACL 2025</a>

Our work "Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation" proposes a simple yet effective SOTA method for machine unlearning for LLMs.
(1/2)

thumb_up_off_alt30

chat_bubble_outline3

repeat7

shareShare

Yuhui Xu

@xyh6666

3 months ago

🚀 We've been exploring long CoT reasoning models for quite a while. Today, we're excited to share a systematic framework that redefines how to reason efficiently with LLMs: 📌 Fractured Sampling — a unified strategy for parallel thinking at inference time.

thumb_up_off_alt10

chat_bubble_outline0

repeat3

shareShare

Baohao Liao

@baohao_liao

3 months ago

Shout out to MT 🫡

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare