Ayush Chakravarthy (@achakravarthy01) 's Twitter Profile
Ayush Chakravarthy

@achakravarthy01

@StanfordAILab @StanfordNLP

ID: 1874707132639252480

calendar_today02-01-2025 06:39:55

2 Tweet

66 Followers

1,1K Following

Ayush Chakravarthy (@achakravarthy01) 's Twitter Profile Photo

Found out why some AI models learn from RL while others just stare blankly: turns out you need to show them the reasoning 'moves' first! We isolate the specific 'moves' that need to be introduced in a model's training! Paper: arxiv.org/abs/2503.01307

Benjamin F Spector (@bfspector) 's Twitter Profile Photo

(1/7) Inspired by DeepSeek's FlashMLA, we're releasing ThunderMLA—a fused megakernel optimized for variable-prompt decoding! ⚡️🐱ThunderMLA is up to 35% faster than FlashMLA and just 400 LoC. Blog: bit.ly/4kubAAK With Aaryan Singhal, Dan Fu, and @hazyresearch!

(1/7) Inspired by DeepSeek's FlashMLA, we're releasing ThunderMLA—a fused megakernel optimized for variable-prompt decoding! ⚡️🐱ThunderMLA is up to 35% faster than FlashMLA and just 400 LoC.

Blog: bit.ly/4kubAAK
With <a href="/AaryanSinghal4/">Aaryan Singhal</a>, <a href="/realDanFu/">Dan Fu</a>, and @hazyresearch!
Shikhar (@shikharmurty) 's Twitter Profile Photo

New #NAACL2025 paper! 🚨 Transformer LMs are data hungry, we propose a new auxiliary loss function (TreeReg) to fix that. TreeReg takes bracketing decisions from syntax trees and turns them into orthogonality constraints on span representations. ✅ Boosts pre-training data

Benjamin F Spector (@bfspector) 's Twitter Profile Photo

(1/5) We’ve never enjoyed watching people chop Llamas into tiny pieces. So, we’re excited to be releasing our Low-Latency-Llama Megakernel! We run the whole forward pass in single kernel. Megakernels are faster & more humane. Here’s how to treat your Llamas ethically: (Joint

(1/5) We’ve never enjoyed watching people chop Llamas into tiny pieces.

So, we’re excited to be releasing our Low-Latency-Llama Megakernel! We run the whole forward pass in single kernel.

Megakernels are faster &amp; more humane. Here’s how to treat your Llamas ethically:

(Joint
Omar Shaikh (@oshaikh13) 's Twitter Profile Photo

What if LLMs could learn your habits and preferences well enough (across any context!) to anticipate your needs? In a new paper, we present the General User Model (GUM): a model of you built from just your everyday computer use. 🧵

Sabri Eyuboglu (@eyuboglusabri) 's Twitter Profile Photo

When we put lots of text (eg a code repo) into LLM context, cost soars b/c of the KV cache’s size. What if we trained a smaller KV cache for our documents offline? Using a test-time training recipe we call self-study, we find that this can reduce cache memory on avg 39x

When we put lots of text (eg a code repo) into LLM context, cost soars b/c of the KV cache’s size.

What if we trained a smaller KV cache for our documents offline? Using a test-time training recipe we call self-study, we find that this can reduce cache memory on avg 39x
Ryan Ehrlich (@ryansehrlich) 's Twitter Profile Photo

Giving LLMs very large amounts of context can be really useful, but it can also be slow and expensive. Could scaling inference time compute help? In our latest work, we show that allowing models to spend test time compute to “self-study” a large corpora can >20x decode

Jon Saad-Falcon (@jonsaadfalcon) 's Twitter Profile Photo

How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning

How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 
🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning
Daniel Wurgaft (@danielwurgaft) 's Twitter Profile Photo

🚨New paper! We know models learn distinct in-context learning strategies, but *why*? Why generalize instead of memorize to lower loss? And why is generalization transient? Our work explains this & *predicts Transformer behavior throughout training* without its weights! 🧵 1/

Parth Sarthi (@parthsarthi03) 's Twitter Profile Photo

With the move to Compound AI systems— built from components like finetunable/closed-source models, LLM selectors, and more— one big challenge is end-to-end optimization. Optimizing each component individually doesn't necessarily guarantee optimization of the full system. Our

Perry Dong (@perryadong) 's Twitter Profile Photo

Fine-tuning pre-trained robotic models with online RL requires a way to train RL with expressive policies Can we design an effective method for this? We propose EXPO, a sample-efficient online RL algorithm that enables stable fine-tuning of expressive policy classes (1/6)

Jared Moore (@jaredlcm) 's Twitter Profile Photo

I'm excited to share work to appear at Conference on Language Modeling! Theory of Mind (ToM) lets us understand others' mental states. Can LLMs go beyond predicting mental states to changing them? We introduce MINDGAMES to test Planning ToM--the ability to intervene on others' beliefs & persuade them

Jubayer Ibn Hamid (@jubayer_hamid) 's Twitter Profile Photo

Exploration is fundamental to RL. Yet policy gradient methods often collapse: during training they fail to explore broadly, and converge into narrow, easily exploitable behaviors. The result is poor generalization, limited gains from test-time scaling, and brittleness on tasks

Exploration is fundamental to RL. Yet policy gradient methods often collapse: during training they fail to explore broadly, and converge into narrow, easily exploitable behaviors. The result is poor generalization, limited gains from test-time scaling, and brittleness on tasks
Yoonho Lee (@yoonholeee) 's Twitter Profile Photo

The standard way to improve reasoning in LLMs is to train on long chains of thought. But these traces are often brute-force and shallow. Introducing RLAD, where models instead learn _reasoning abstractions_: concise textual strategies that guide structured exploration. 1/N🧵

The standard way to improve reasoning in LLMs is to train on long chains of thought.

But these traces are often brute-force and shallow.

Introducing RLAD, where models instead learn _reasoning abstractions_: concise textual strategies that guide structured exploration. 
1/N🧵