Eshaan Nichani (@eshaannichani) Twitter Tweets • TwiCopy

Eshaan Nichani

@eshaannichani

+ Follow

PhD student @ Princeton University · Theoretical Machine Learning · Previously Math & CS @ MIT · he/him

ID: 1382509387755978752

linkhttp://eshaannichani.com calendar_today15-04-2021 01:41:51

54 Tweet

598 Followers

240 Following

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Come hear about how transformers perform factual recall using associative memories, and how this emerges in phases during training! #ICLR2025 poster #602 at 3pm today. Lead by Eshaan Nichani Link: iclr.cc/virtual/2025/p… Paper: arxiv.org/abs/2412.06538

thumb_up_off_alt47

chat_bubble_outline1

repeat9

shareShare

Jason Lee

@jasondeanlee

3 months ago

Our new work on scaling laws that includes compute, model size, and number of samples. The analysis involves an extremely fine-grained analysis of online sgd built up over the last 8 years of understanding sgd on simple toy models (tensors, single index models, multi index model)

thumb_up_off_alt149

chat_bubble_outline1

repeat15

shareShare

Zixuan Wang

@zzzixuanwang

2 months ago

LLMs can solve complex tasks that require combining multiple reasoning steps. But when are such capabilities learnable via gradient-based training? In our new COLT 2025 paper, we show that easy-to-hard data is necessary and sufficient! arxiv.org/abs/2505.23683 🧵 below (1/10)

thumb_up_off_alt186

chat_bubble_outline1

repeat34

shareShare

Eshaan Nichani

@eshaannichani

2 months ago

Transformers can efficiently *express* compositional/multi-step reasoning tasks. But what about *learnability*? Our new paper shows that learnability via GD is computationally hard, unless models are trained on a curriculum/mixture of easy-to-hard data! See🧵below for more!

thumb_up_off_alt33

chat_bubble_outline1

repeat2

shareShare

Jason Lee

@jasondeanlee

2 months ago

New work arxiv.org/abs/2506.05500 on learning multi-index models with Alex Damian and Joan Bruna. Multi-index are of the form y= g(Ux), where U=r by d maps from d dimension to r dimension and d>>r. g is an arbitrary function. Examples of multi-index models are any neural net

thumb_up_off_alt112

chat_bubble_outline2

repeat19

shareShare

Eshaan Nichani

Gate.io

Alberto Bietti

Jason Lee

Zixuan Wang

Eshaan Nichani

Jason Lee