Audrey Huang (@auddery) 's Twitter Profile
Audrey Huang

@auddery

ID: 1790453938724171779

linkhttps://audhuang.github.io/ calendar_today14-05-2024 18:47:43

7 Tweet

86 Followers

60 Following

Yuda Song @ ICLR 2025 (@yus167) 's Twitter Profile Photo

New work on understanding preference fine-tuning/RLHF -- we analyze online and offline preference fine-tuning methods via the theoretical tool of dataset coverage and reveal the importance of online unlabeled data. Plus, a new algorithm! (1/n)

New work on understanding preference fine-tuning/RLHF -- we analyze online and offline preference fine-tuning methods via the theoretical tool of dataset coverage and reveal the importance of online unlabeled data. Plus, a new algorithm! (1/n)
Dylan Foster 🐢 (@canondetortugas) 's Twitter Profile Photo

New preprint: Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning We show that good old fashioned behavior cloning enjoys horizon-independent sample complexity for imitation learning—provided you use the log loss! arxiv.org/abs/2407.15007 Thread below

Dylan Foster 🐢 (@canondetortugas) 's Twitter Profile Photo

Given a high-quality verifier, language model accuracy can be improved by scaling inference-time compute (e.g., w/ repeated sampling). When can we expect similar gains without an external verifier? New paper: Self-Improvement in Language Models: The Sharpening Mechanism

Given a high-quality verifier, language model accuracy can be improved by scaling inference-time compute (e.g., w/ repeated sampling). When can we expect similar gains without an external verifier? 

New paper: Self-Improvement in Language Models: The Sharpening Mechanism
Dylan Foster 🐢 (@canondetortugas) 's Twitter Profile Photo

Check out the paper for more details: arxiv.org/abs/2412.01951 Joint work w/ Audrey Huang (Audrey Huang), Adam Block, Dhruv Rohatgi, Cyril Zhang (Cyril Zhang), Max Simchowitz (Max Simchowitz), Jordan Ash (Jordan Ash), and Akshay Krishnamurthy

Dylan Foster 🐢 (@canondetortugas) 's Twitter Profile Photo

Akshay presenting InferenceTimePessimism, a new alternative to BoN sampling for scaling test-time compute. From our recent paper here: arxiv.org/abs/2503.21878

Dylan Foster 🐢 (@canondetortugas) 's Twitter Profile Photo

Is Best-of-N really the best we can do for language model inference?   New algo & paper: 🚨InferenceTimePessimism🚨 Led by the amazing Audrey Huang (Audrey Huang) with Adam Block, Qinghua Liu, Nan Jiang (Nan Jiang), and Akshay Krishnamurthy. Appearing at ICML '25. 1/11

Is Best-of-N really the best we can do for language model inference?  

New algo & paper: 🚨InferenceTimePessimism🚨

Led by the amazing Audrey Huang (<a href="/auddery/">Audrey Huang</a>) with Adam Block, Qinghua Liu, Nan Jiang (<a href="/nanjiang_cs/">Nan Jiang</a>), and Akshay Krishnamurthy. Appearing at ICML '25.

1/11
Nived Rajaraman (@nived_rajaraman) 's Twitter Profile Photo

Announcing the first workshop on Foundations of Post-Training (FoPT) at COLT 2025! 📝 Soliciting abstracts/posters exploring theoretical & practical aspects of post-training and RL with language models! │ 🗓️ Deadline: May 19, 2025

Announcing the first workshop on Foundations of Post-Training (FoPT) at COLT 2025!

📝 Soliciting abstracts/posters exploring theoretical &amp; practical aspects of post-training and RL with language models!
│
🗓️ Deadline: May 19, 2025
Dylan Foster 🐢 (@canondetortugas) 's Twitter Profile Photo

RL and post-training play a central role in giving language models advanced reasoning capabilities, but many algorithmic and scientific questions remain unanswered. Join us at FoPT @ COLT '25 to explore pressing emerging challenges and opportunities for theory to bring clarity.