
Yushun Zhang
@ericzhang0410
Phd student at The Chinese University of Hong Kong, shenzhen, China,
Working on optimization and LLMs zyushun.github.io
ID: 1239780017040580610
17-03-2020 05:06:12
326 Tweet
279 Followers
357 Following



Check out this excellent work led by Dmitry Dmitry Rybin ! We discovered a new algorithm to compute the matrix product XX^t with 5% fewer number of multiplications





Holy shit. Kimi K2 was pre-trained on 15.5T tokens using MuonClip with zero training spike. Muon has officially scaled to the 1-trillion-parameter LLM level. Many doubted it could scale, but here we are. So proud of the Moum team: Keller Jordan, Vlado Boza, You Jiacheng,


Awesome! Kaiyue Wen this is related to our discussion before.

