Zhiyuan Li (@zhiyuanli_) Twitter Tweets • TwiCopy

📡Join us at the 2nd workshop on Mathematics of Modern Machine Learning (M3L) at #NeurIPS2024! sites.google.com/view/m3l-2024/ Submission deadline: September 29, 2024

thumb_up_off_alt18

chat_bubble_outline1

repeat5

shareShare

M3L Workshop @ NeurIPS 2024

@m3lworkshop

10 months ago

We've extended the #M3L submission deadline to October 1st AoE to align with ICLR timelines. We look forward to your work!

thumb_up_off_alt8

chat_bubble_outline0

repeat5

shareShare

Exciting new work led by amazing Kaiyue Wen on theoretical justification for the recent popular WSD schedule! This is based an interesting and novel assumption of training loss called "River Valley", which is useful to explain hidden progress in large learning rate training.

thumb_up_off_alt11

chat_bubble_outline0

repeat0

shareShare

M3L Workshop @ NeurIPS 2024

@m3lworkshop

7 months ago

Hope everyone had fun at the 2nd workshop of M3L! Many thanks to the speakers, authors, reviewers, and participants for making this workshop a success. We had a full house again, and we hope to see you next year! 💡

thumb_up_off_alt17

chat_bubble_outline0

repeat5

shareShare

Kaifeng Lyu

@vfleaking

4 months ago

Can we quantify the effect of learning rate schedules? Empirically, what's the best schedule for LLM pretraining? 🚀Excited to share our ICLR paper! arxiv.org/abs/2503.12811 With ≤3 runs, you can fit our empirical law and optimize your schedule—a WSD-like schedule is the best!

thumb_up_off_alt68

chat_bubble_outline1

repeat19

shareShare

David Yin

@davidyin0609

3 months ago

SVRG is popular in theoretical optimization, but it has not been widely adopted to train large neural networks. In our ICLR work “A Coefficient Makes SVRG Effective”, we show that adding a coefficient helps SVRG optimize deep neural networks. arxiv.org/abs/2311.05589

thumb_up_off_alt37

chat_bubble_outline2

repeat3

shareShare

Nikunj Saunshi

@nsaunshi

3 months ago

Don't miss the poster presentation for this by Nishanth Dikkala at #ICLR2025 tomorrow to learn more about our work on looped Transformers for reasoning! Poster #272: Hall 3 + 2B. Sat 26th April, 10am - 12:30pm Singapore time

Don't miss the poster presentation for this by <a href="/NishanthDikkala/">Nishanth Dikkala</a> at #ICLR2025 tomorrow to learn more about our work on looped Transformers for reasoning!

Poster #272: Hall 3 + 2B. Sat 26th April, 10am - 12:30pm Singapore time

thumb_up_off_alt23

chat_bubble_outline0

repeat2

shareShare

Zhiyuan Li

@zhiyuanli_

3 months ago

Excited to share our new method ✏️PENCIL! It decouples space complexity from time complexity in LLM reasoning, by allowing model to recursively erase and generate thoughts. Joint work w. my student Chenxiao Yang , along with Nati Srebro Bartom and David McAllester.

thumb_up_off_alt35

chat_bubble_outline1

repeat9

shareShare