Anastasiia Filippova🇺🇦 (@nasfilippova) 's Twitter Profile
Anastasiia Filippova🇺🇦

@nasfilippova

Apple🍏, WorldQuant, EPFL, MIPT

ID: 1585699311299477504

linkhttps://anasfil.io calendar_today27-10-2022 18:26:10

140 Tweet

388 Followers

235 Following

Lewis Tunstall (@_lewtun) 's Twitter Profile Photo

We took a deep dive into the DeepSeek R1 tech report today at Hugging Face and recorded the discussion :) Let me know if you'd like us to publish our journal club more often! youtu.be/1xDVbu-WaFo

Anastasiia Filippova🇺🇦 (@nasfilippova) 's Twitter Profile Photo

Thrilled to share that our work No Need to Talk: Asynchronous Mixture of Language Models [arxiv.org/abs/2410.03529] has been accepted to #ICLR2025! In this paper, we explore strategies to mitigate the communication cost of large language models, both at training and inference,

Awni Hannun (@awnihannun) 's Twitter Profile Photo

DeepSeek R1 (the full 680B model) runs nicely in higher quality 4-bit on 3 M2 Ultras with MLX. Asked it a coding question and it thought for ~2k tokens and generated 3500 tokens overall:

Awni Hannun (@awnihannun) 's Twitter Profile Photo

The DeepSeek V3 model file is ~450 lines of code in MLX LM. Includes pipeline-parallelism and all. Good way to see how it all works.

The DeepSeek V3 model file is ~450 lines of code in MLX LM. Includes pipeline-parallelism and all.

Good way to see how it all works.
Samira Abnar (@samira_abnar) 's Twitter Profile Photo

🚨 One question that has always intrigued me is the role of different ways to increase a model's capacity: parameters, parallelizable compute, or sequential compute? We explored this through the lens of MoEs:

🚨 One question that has always intrigued me is the role of different ways to increase a model's capacity: parameters, parallelizable compute, or sequential compute? 

We explored this through the lens of MoEs:
Rylan Schaeffer (@rylanschaeffer) 's Twitter Profile Photo

I'm going to catch hell for posting but to summarize: 1. This paper misled its way to an #ICLR2025 Oral 2. I pointed this out 3. AC rejected the paper 4. Authors complained & somehow persuaded ICLR to overrule the AC and award a Spotlight 5. AC made clear they were overruled