Kwangjun Ahn (@kwangjuna) Twitter Tweets • TwiCopy

Kwangjun Ahn

@kwangjuna

2 years ago

Check out Xiang Cheng’s talk on our linear transformer works given at Simons Institute!! youtube.com/live/PnwC74s1n…

thumb_up_off_alt50

chat_bubble_outline0

repeat7

shareShare

Ahmad Beirami @ ICLR 2025

@abeirami

2 years ago

If you're at #NeurIPS2023, Kwangjun Ahn will be presenting his work on SpecTr++ in Optimal Transport workshop where he discusses improved transport plans for speculative decoding.

thumb_up_off_alt17

chat_bubble_outline0

repeat3

shareShare

Exciting new paper by Kwangjun Ahn (Kwangjun Ahn) and Ashok Cutkosky (Ashok Cutkosky)! Adam with model exponential moving average is effective for nonconvex optimization arxiv.org/pdf/2405.18199 This approach to analyzing Adam is extremely promising IMHO.

thumb_up_off_alt88

chat_bubble_outline3

repeat13

shareShare

Kwangjun Ahn

@kwangjuna

a year ago

I successfully defended my thesis at MIT EECS yesterday! A huge thank you to my advisors, Suvrit and Ali, and my committee Ashia! It talks about my recent works on Transformers and Adam those who are interested, check out the video: youtu.be/5rgrB7TGPdc

thumb_up_off_alt268

chat_bubble_outline14

repeat10

shareShare

Kwangjun Ahn

@kwangjuna

a year ago

In our ICML 2024 paper (ICML Conference), joint w/ Zhiyu Zhang (Zhiyu Zhang), Yunbum Kook, Yan Dai, we provide a new perspective on Adam optimizer based on online learning. In particular, our perspective shows the importance of Adam's key components. (video: youtu.be/AU39SNkkIsA)

In our ICML 2024 paper (<a href="/icmlconf/">ICML Conference</a>), joint w/ Zhiyu Zhang (<a href="/imZhiyuZ/">Zhiyu Zhang</a>), Yunbum Kook, Yan Dai, we provide a new perspective on Adam optimizer based on online learning. In particular, our perspective shows the importance of Adam's key components. (video: youtu.be/AU39SNkkIsA)

thumb_up_off_alt71

chat_bubble_outline0

repeat13

shareShare

Sham Kakade

@shamkakade6

a year ago

What's the opt optimizer? New work comparing (diagonally conditioned) first order methods.

thumb_up_off_alt7

chat_bubble_outline0

repeat2

shareShare

Kwangjun Ahn

@kwangjuna

a year ago

Come to my presentation of ICML 2024 paper tmrw at 1:30–3 pm! We provide a new perspective on Adam optimizer based on online learning. In particular, our perspective shows the importance of Adam's key components. (video: youtu.be/AU39SNkkIsA)

thumb_up_off_alt84

chat_bubble_outline0

repeat14

shareShare

John Langford

@johnclangford

a year ago

New reqs for low to high level researcher positions: jobs.careers.microsoft.com/global/en/job/… , jobs.careers.microsoft.com/global/en/job/…, jobs.careers.microsoft.com/global/en/job/…, jobs.careers.microsoft.com/global/en/job/…, with postdocs from Akshay and Miro Dudik x.com/MiroDudik/stat… . Please apply or pass to those who may :-)

thumb_up_off_alt108

chat_bubble_outline0

repeat33

shareShare

John Langford

@johnclangford

a year ago

Last year, we had offers accepted from Kwangjun Ahn, Riashat Islam, Tim Pearce , Pratyusha Sharma while Akshay and Miro Dudik hired 7(!) postdocs.

thumb_up_off_alt12

chat_bubble_outline0

repeat2

shareShare

John Langford

@johnclangford

7 months ago

The Belief State Transformer edwardshu.com/bst-website/ is at ICLR this week. The BST objective efficiently creates compact belief states: summaries of the past sufficient for all future predictions. See the short talk: microsoft.com/en-us/research… and mgostIH for further discussion.

thumb_up_off_alt104

chat_bubble_outline5

repeat19

shareShare

Kwangjun Ahn

@kwangjuna

7 months ago

ICLR: Edward Hu and I will be presenting our work "The Belief State Transformer" at the 1st poster session. (#269) Please come check it out! (github: github.com/microsoft/BST)

thumb_up_off_alt15

chat_bubble_outline0

repeat0

shareShare

You Jiacheng

@youjiacheng

5 months ago

Kevin Frans Depen Morwani Kwangjun Ahn Nikhil Vyas Oh I found them: linear warmup and then constant

<a href="/kvfrans/">Kevin Frans</a> <a href="/depen_morwani/">Depen Morwani</a> <a href="/KwangjunA/">Kwangjun Ahn</a> <a href="/vyasnikhil96/">Nikhil Vyas</a> Oh I found them:
linear warmup and then constant

thumb_up_off_alt11

chat_bubble_outline1

repeat1

shareShare

Jeremy Bernstein

@jxbz

5 months ago

elie noahamsel Robert M. Gower 🇺🇦 and also Dion by Kwangjun Ahn, John Langford et al arxiv.org/abs/2504.05295

thumb_up_off_alt18

chat_bubble_outline1

repeat2

shareShare

Konstantin Mishchenko

@konstmish

4 months ago

Schedule-Free methods, which forgo cosine/linear schedulers by averaging iterates and computing gradients at interpolated points, yield smoother training curves. It's still unclear why they work well, and this paper explains the phenomenon through the river-valley loss landscape.

thumb_up_off_alt141

chat_bubble_outline4

repeat19

shareShare

Gagik Magakyan

@gagmagakyan

4 months ago

If you are at ICML 2025, come check out our oral presentation about the non-convex theory of Schedule Free SGD in the Optimization session tomorrow! This work was done with amazing collaborators Kwangjun Ahn and Ashok Cutkosky.

thumb_up_off_alt3

chat_bubble_outline1

repeat1

shareShare

Kwangjun Ahn

@kwangjuna

4 months ago

ICML: come check out our Oral Presentation on Schedule-free training theory based on an elegant online learning!

thumb_up_off_alt45

chat_bubble_outline0

repeat6

shareShare

Seungwook Han

@seungwookh

4 months ago

But actually this is the og way of doing it and should stop by E-2103 to see Jeremy Bernstein and Laker Newhouse whiteboard the whole paper.

But actually this is the og way of doing it and should stop by E-2103 to see <a href="/jxbz/">Jeremy Bernstein</a> and Laker Newhouse whiteboard the whole paper.

thumb_up_off_alt75

chat_bubble_outline1

repeat6

shareShare

Mikhail Parakhin

@mparakhin

4 months ago

Since nobody asked :-), here is my list of papers not to be missed from ICML: 1) Dion: distributed orthonormalized updates (well, technically not at ICML, but everyone's talking about it). 2) MARS: Unleashing the Power of Variance Reduction for Training Large Models 3) ...

thumb_up_off_alt432

chat_bubble_outline6

repeat32

shareShare

John Langford

@johnclangford

4 months ago

Apparently Dion is now being worked on for Torch Titan: github.com/pytorch/torcht… :-)

thumb_up_off_alt104

chat_bubble_outline0

repeat8

shareShare

Laker Newhouse

@lakernewhouse

4 months ago

[1/6] Curious about Muon, but not sure where to start? I wrote a 3-part blog series called “Understanding Muon” designed to get you up to speed—with The Matrix references, annotated source code, and thoughts on where Muon might be going.

thumb_up_off_alt314

chat_bubble_outline7

repeat39

shareShare

Kwangjun Ahn

Kwangjun Ahn

Ahmad Beirami @ ICLR 2025

Aaron Defazio

Kwangjun Ahn

Kwangjun Ahn

Sham Kakade

Kwangjun Ahn

John Langford

John Langford

John Langford

Kwangjun Ahn

You Jiacheng

Jeremy Bernstein

Konstantin Mishchenko

Gagik Magakyan

Kwangjun Ahn

Seungwook Han

Mikhail Parakhin

John Langford

Laker Newhouse