Ayush Chakravarthy (@achakravarthy01) Twitter Tweets • TwiCopy

Ayush Chakravarthy

8 months ago

Found out why some AI models learn from RL while others just stare blankly: turns out you need to show them the reasoning 'moves' first! We isolate the specific 'moves' that need to be introduced in a model's training! Paper: arxiv.org/abs/2503.01307

thumb_up_off_alt5

chat_bubble_outline1

repeat2

shareShare

Benjamin F Spector

@bfspector

8 months ago

(1/7) Inspired by DeepSeek's FlashMLA, we're releasing ThunderMLA—a fused megakernel optimized for variable-prompt decoding! ⚡️🐱ThunderMLA is up to 35% faster than FlashMLA and just 400 LoC. Blog: bit.ly/4kubAAK With Aaryan Singhal, Dan Fu, and @hazyresearch!

thumb_up_off_alt370

chat_bubble_outline7

repeat70

shareShare

Shikhar

@shikharmurty

6 months ago

New #NAACL2025 paper! 🚨 Transformer LMs are data hungry, we propose a new auxiliary loss function (TreeReg) to fix that. TreeReg takes bracketing decisions from syntax trees and turns them into orthogonality constraints on span representations. ✅ Boosts pre-training data

thumb_up_off_alt93

chat_bubble_outline4

repeat22

shareShare

Benjamin F Spector

@bfspector

5 months ago

(1/5) We’ve never enjoyed watching people chop Llamas into tiny pieces. So, we’re excited to be releasing our Low-Latency-Llama Megakernel! We run the whole forward pass in single kernel. Megakernels are faster & more humane. Here’s how to treat your Llamas ethically: (Joint

thumb_up_off_alt863

chat_bubble_outline32

repeat142

shareShare

Ayush Chakravarthy

@achakravarthy01

5 months ago

Tokasaurus is out! Happy Throughput Thursday to those who celebrate :)

thumb_up_off_alt10

chat_bubble_outline0

repeat3

shareShare

Omar Shaikh

@oshaikh13

5 months ago

What if LLMs could learn your habits and preferences well enough (across any context!) to anticipate your needs? In a new paper, we present the General User Model (GUM): a model of you built from just your everyday computer use. 🧵

thumb_up_off_alt181

chat_bubble_outline12

repeat57

shareShare

Sabri Eyuboglu

@eyuboglusabri

5 months ago

When we put lots of text (eg a code repo) into LLM context, cost soars b/c of the KV cache’s size. What if we trained a smaller KV cache for our documents offline? Using a test-time training recipe we call self-study, we find that this can reduce cache memory on avg 39x

thumb_up_off_alt287

chat_bubble_outline12

repeat66

shareShare

Ryan Ehrlich

@ryansehrlich

5 months ago

Giving LLMs very large amounts of context can be really useful, but it can also be slow and expensive. Could scaling inference time compute help? In our latest work, we show that allowing models to spend test time compute to “self-study” a large corpora can >20x decode

thumb_up_off_alt33

chat_bubble_outline0

repeat7

shareShare

Jon Saad-Falcon

@jonsaadfalcon

4 months ago

How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning

thumb_up_off_alt204

chat_bubble_outline11

repeat56

shareShare

Daniel Wurgaft

@danielwurgaft

4 months ago

🚨New paper! We know models learn distinct in-context learning strategies, but *why*? Why generalize instead of memorize to lower loss? And why is generalization transient? Our work explains this & *predicts Transformer behavior throughout training* without its weights! 🧵 1/

thumb_up_off_alt51

chat_bubble_outline1

repeat16

shareShare

Parth Sarthi

@parthsarthi03

4 months ago

With the move to Compound AI systems— built from components like finetunable/closed-source models, LLM selectors, and more— one big challenge is end-to-end optimization. Optimizing each component individually doesn't necessarily guarantee optimization of the full system. Our

thumb_up_off_alt18

chat_bubble_outline1

repeat4

shareShare

Perry Dong

@perryadong

4 months ago

Fine-tuning pre-trained robotic models with online RL requires a way to train RL with expressive policies Can we design an effective method for this? We propose EXPO, a sample-efficient online RL algorithm that enables stable fine-tuning of expressive policy classes (1/6)

thumb_up_off_alt11

chat_bubble_outline1

repeat2

shareShare

Jared Moore

@jaredlcm

3 months ago

I'm excited to share work to appear at Conference on Language Modeling! Theory of Mind (ToM) lets us understand others' mental states. Can LLMs go beyond predicting mental states to changing them? We introduce MINDGAMES to test Planning ToM--the ability to intervene on others' beliefs & persuade them

thumb_up_off_alt72

chat_bubble_outline4

repeat7

shareShare

Jubayer Ibn Hamid

@jubayer_hamid

a month ago

Exploration is fundamental to RL. Yet policy gradient methods often collapse: during training they fail to explore broadly, and converge into narrow, easily exploitable behaviors. The result is poor generalization, limited gains from test-time scaling, and brittleness on tasks

thumb_up_off_alt1,1K

chat_bubble_outline16

repeat135

shareShare

Yoonho Lee

@yoonholeee

a month ago

The standard way to improve reasoning in LLMs is to train on long chains of thought. But these traces are often brute-force and shallow. Introducing RLAD, where models instead learn _reasoning abstractions_: concise textual strategies that guide structured exploration. 1/N🧵

thumb_up_off_alt386

chat_bubble_outline8

repeat37

shareShare

Kanishk Gandhi

@gandhikanishk

a month ago

Come to our talk Conference on Language Modeling tomorrow morning after the keynote! Ayush Chakravarthy will be presenting, and also stop by our poster after the oral session to chat with him and Anikait Singh !

thumb_up_off_alt23

chat_bubble_outline0

repeat7

shareShare

nathan lile

@nathanthinks

25 days ago

we’re at #COLM2025🍁 come see our poster # 26 (session 1) today reach out ✉️ if you'd like to chat!

thumb_up_off_alt47

chat_bubble_outline1

repeat6

shareShare