Deepak Narayanan (@deepakn94) 's Twitter Profile
Deepak Narayanan

@deepakn94

Research Scientist at @nvidia. Interested in the intersection of Computer Systems and ML. Occasionally tweet about sports. Views are my own.

ID: 489531820

linkhttps://deepakn94.github.io/ calendar_today11-02-2012 16:37:17

402 Tweet

1,1K Followers

1,1K Following

Julien Launay (@slippylolo) 's Twitter Profile Photo

Cool recent MLSys paper on MoE brought up by Deepak Narayanan at the ES-FoMo@ICML2025 workshop: arxiv.org/abs/2211.15841 ➡️ Casts MoE as block-sparse operations, enabling highly efficient GPU implementation.

Bryan Catanzaro (@ctnzr) 's Twitter Profile Photo

A 8B-3.5T hybrid SSM model gets better accuracy than an 8B-3.5T transformer trained on the same dataset: * 7% attention, the rest is Mamba2 * MMLU jumps from 50 to 53.6% * Training efficiency is the same * Inference cost is much less arxiv.org/pdf/2406.07887

A 8B-3.5T hybrid SSM model gets better accuracy than an 8B-3.5T transformer trained on the same dataset:
* 7% attention, the rest is Mamba2
* MMLU jumps from 50 to 53.6%
* Training efficiency is the same
* Inference cost is much less
arxiv.org/pdf/2406.07887