Deepak Narayanan (@deepakn94) Twitter Tweets • TwiCopy

Deepak Narayanan

@deepakn94

+ Follow

Research Scientist at @nvidia. Interested in the intersection of Computer Systems and ML. Occasionally tweet about sports. Views are my own.

ID: 489531820

linkhttps://deepakn94.github.io/ calendar_today11-02-2012 16:37:17

402 Tweet

1,1K Followers

1,1K Following

Julien Launay

@slippylolo

2 years ago

Cool recent MLSys paper on MoE brought up by Deepak Narayanan at the ES-FoMo@ICML2025 workshop: arxiv.org/abs/2211.15841 ➡️ Casts MoE as block-sparse operations, enabling highly efficient GPU implementation.

thumb_up_off_alt41

chat_bubble_outline0

repeat17

shareShare

Sasha Rush

@srush_nlp

2 years ago

Goddamn, wild times. Here's the paper you need to read understand today: arxiv.org/pdf/2211.15841… have fun.

thumb_up_off_alt1,1K

chat_bubble_outline13

repeat127

shareShare

Bryan Catanzaro

@ctnzr

a year ago

A 8B-3.5T hybrid SSM model gets better accuracy than an 8B-3.5T transformer trained on the same dataset: * 7% attention, the rest is Mamba2 * MMLU jumps from 50 to 53.6% * Training efficiency is the same * Inference cost is much less arxiv.org/pdf/2406.07887