Michael Zhang (@mzhangio) 's Twitter Profile
Michael Zhang

@mzhangio

CS PhD Student @hazyresearch, @StanfordAILab. Foundations of foundation models. Making them more reliable and efficient. Also do new things.

ID: 701189916387049472

linkhttp://michaelzhang.xyz calendar_today20-02-2016 23:41:18

268 Tweet

1,1K Followers

503 Following

Michael Zhang (@mzhangio) 's Twitter Profile Photo

new thoughts from the advisor, reflecting on building foundation models for X* - we got bitter-lesson / llm-pilled in our own way - many greats like math + rigor, but sometimes stupid just works better - this “clarity” might challenge how we should think about LLMs - our lab

Michael Zhang (@mzhangio) 's Twitter Profile Photo

yay more self-improving systems use LLM to write kernels, make test-time compute cheaper put those kernels back into the LLMs, so they can do more test-time compute + come up w even better kernels repeat ???

yay more self-improving systems

use LLM to write kernels, make test-time compute cheaper

put those kernels back into the LLMs, so they can do more test-time compute + come up w even better kernels

repeat ???
Neel Guha (@neelguha) 's Twitter Profile Photo

What's (1) a "drink of fresh fruit pureed with milk, yogurt, or ice cream" and (2) an unsupervised algorithm for test-time LLM routing? Our #NeurIPS2024 paper, Smoothie! 🥤 arxiv.org/abs/2412.04692 1/9

Piero Molino (@w4nderlus7) 's Twitter Profile Photo

Today, I’m excited to unveil a project that’s incredibly close to my heart. As a lifelong gamer, I’ve always dreamed of pushing the boundaries of what’s possible in video games using my expertise in AI. Over the past year, my team at Studio Atelico and I have been blending the

Michael Zhang (@mzhangio) 's Twitter Profile Photo

good read! Jared Dunnmon yes china used llama for military* but when the nation-states start post-training their own LLMs (for military or otherwise; not everyone wants to share their prompt data) seems like it'd be nice if they built on US AI infra ? *reuters.com/technology/art…

Benjamin F Spector (@bfspector) 's Twitter Profile Photo

(1/7) Inspired by DeepSeek's FlashMLA, we're releasing ThunderMLA—a fused megakernel optimized for variable-prompt decoding! ⚡️🐱ThunderMLA is up to 35% faster than FlashMLA and just 400 LoC. Blog: bit.ly/4kubAAK With Aaryan Singhal, Dan Fu, and @hazyresearch!

(1/7) Inspired by DeepSeek's FlashMLA, we're releasing ThunderMLA—a fused megakernel optimized for variable-prompt decoding! ⚡️🐱ThunderMLA is up to 35% faster than FlashMLA and just 400 LoC.

Blog: bit.ly/4kubAAK
With <a href="/AaryanSinghal4/">Aaryan Singhal</a>, <a href="/realDanFu/">Dan Fu</a>, and @hazyresearch!
Michael Zhang (@mzhangio) 's Twitter Profile Photo

new thoughts from the adviser! Don't agree w all the aesthetic delivery, but I do believe better to have world's AI be on familiar tech I do wonder how US AI wins on consumer, and how we should probs do stuff here. China 🇨🇳 cares less about privacy + has the super-apps (++data)