Maciej Kilian (@kilian_maciej) 's Twitter Profile
Maciej Kilian

@kilian_maciej

intelligence flows ; founding team @perceptroninc

ID: 1348734194705428481

linkhttp://github.com/iejMac calendar_today11-01-2021 20:51:12

218 Tweet

668 Followers

151 Following

Akshat Shrivastava (@akshats07) 's Twitter Profile Photo

Excited to see further studies into early fusion vs late fusion models, in particular a great analysis into multimodal MoE’s aligned with our findings in MoMa on designing parameter specialization in multimodal LLMs. A few key things that helped us on top of the results presented

Maciej Kilian (@kilian_maciej) 's Twitter Profile Photo

very cool. we found similar results in diffusion model training where EMA on model weights & const LR is more common. section 5.3 arxiv.org/pdf/2405.13218

very cool. we found similar results in diffusion model training where EMA on model weights & const LR is more common.

section 5.3 arxiv.org/pdf/2405.13218
Sam Altman (@sama) 's Twitter Profile Photo

i think we should stop arguing about what year AGI will arrive and start arguing about what year the first self-replicating spaceship will take off

Jeremy Bernstein (@jxbz) 's Twitter Profile Photo

Laker and I are presenting this work in an hour at ICML poster E-2103. It’s on a theoretical framework and language (modula) for optimizers that are fast (like Shampoo) and scalable (like muP). You can think of modula as Muon extended to general layer types and network topologies

Laker and I are presenting this work in an hour at ICML poster E-2103. It’s on a theoretical framework and language (modula) for optimizers that are fast (like Shampoo) and scalable (like muP). You can think of modula as Muon extended to general layer types and network topologies