Azalia Mirhoseini (@azaliamirh) 's Twitter Profile
Azalia Mirhoseini

@azaliamirh

Assistant Professor of CS at Stanford, Senior Staff Research Scientist at Google DeepMind. Prev: Anthropic, Google Brain

ID: 1469058794

linkhttps://scalingintelligence.stanford.edu/ calendar_today30-05-2013 06:13:58

298 Tweet

13,13K Followers

461 Following

Infini-AI-Lab (@infiniailab) 's Twitter Profile Photo

🥳 Happy to share our new work –  Kinetics: Rethinking Test-Time Scaling Laws 🤔How to effectively build a powerful reasoning agent? Existing compute-optimal scaling laws suggest 64K thinking tokens + 1.7B model > 32B model. But, It only shows half of the picture! 🚨 The O(N²)

🥳 Happy to share our new work –  Kinetics: Rethinking Test-Time Scaling Laws

🤔How to effectively build a powerful reasoning agent?

Existing compute-optimal scaling laws suggest 64K thinking tokens + 1.7B model > 32B model.
But, It only shows half of the picture!

🚨 The O(N²)
Ryan Ehrlich (@ryansehrlich) 's Twitter Profile Photo

Giving LLMs very large amounts of context can be really useful, but it can also be slow and expensive. Could scaling inference time compute help? In our latest work, we show that allowing models to spend test time compute to “self-study” a large corpora can >20x decode

Hermann (@kumbonghermann) 's Twitter Profile Photo

Excited to be presenting our new work–HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation– at #CVPR2025 this week. VAR (Visual Autoregressive Modelling) introduced a very nice way to formulate autoregressive image generation as a next-scale prediction task (from

Excited to be presenting our new work–HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation– at #CVPR2025 this week.

VAR (Visual Autoregressive Modelling) introduced a very nice way to formulate autoregressive image generation as a next-scale prediction task (from
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) (@teortaxestex) 's Twitter Profile Photo

I like this idea very much and have long advocated for something like this. Synthetically enriched «KV prefix» is a natural augment to modern long context models.

I like this idea very much and have long advocated for something like this. Synthetically enriched «KV prefix» is a natural augment to modern long context models.
Soumith Chintala (@soumithchintala) 's Twitter Profile Photo

This is a proper Vibe-coding setup for GPU programmers, and can result in getting surprisingly far! I honestly think that if this authoring experience is v1, then v10 might become the normal way GPU experts start writing serious custom kernels! Great work Anne Ouyang! (finally

Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Shrinking the Generation-Verification Gap with Weak Verifiers "we introduce Weaver, a framework for designing a strong verifier by combining multiple weak, imperfect verifiers." "Weaver leverages weak supervision to estimate each verifier’s accuracy and combines their outputs

Shrinking the Generation-Verification Gap with Weak Verifiers

"we introduce Weaver, a framework for designing a strong verifier by combining multiple weak, imperfect verifiers."

"Weaver leverages weak supervision to estimate each verifier’s accuracy and combines their outputs
Alex Ratner (@ajratner) 's Twitter Profile Photo

Very exciting work on using weak supervision for RL- closing the “generation-verification gap”!! Once again- principled approaches to labeling/data development are the keys!

Oscar Hong (@oscrhong) 's Twitter Profile Photo

Interesting tidbit from prof Christopher Manning: The first mention of “Large Language Model” comes from a 1998 NLP workshop Taiwan! Paper by Chun-Liang Chen, Bo-Ren Bai, Lee-Feng Chien, Lin-Shan Lee. “Large” in 1998 = 20M word corpus

Interesting tidbit from prof <a href="/chrmanning/">Christopher Manning</a>: The first mention of “Large Language Model” comes from a 1998 NLP workshop Taiwan!

Paper by Chun-Liang Chen, Bo-Ren Bai, Lee-Feng Chien, Lin-Shan Lee.

“Large” in 1998 = 20M word corpus
Azalia Mirhoseini (@azaliamirh) 's Twitter Profile Photo

See Jon Saad-Falcon's post for more details: x.com/JonSaadFalcon/… Paper: arxiv.org/abs/2506.18203 Blog: hazyresearch.stanford.edu/blog/2025-06-1… github.com/HazyResearch/s…… Datasets and Models: huggingface.co/collections/ha…

Christopher Manning (@chrmanning) 's Twitter Profile Photo

I’ve joined AIX Ventures as a General Partner, working on investing in deep AI startups. Looking forward to working with founders on solving hard problems in AI and seeing products come out of that!  Thank you Yuliya Chernova at The Wall Street Journal for covering the news: wsj.com/articles/ai-re…