Eshaan Nichani (@eshaannichani) 's Twitter Profile
Eshaan Nichani

@eshaannichani

PhD student @ Princeton University · Theoretical Machine Learning · Previously Math & CS @ MIT · he/him

ID: 1382509387755978752

linkhttp://eshaannichani.com calendar_today15-04-2021 01:41:51

54 Tweet

598 Followers

240 Following

Alberto Bietti (@albertobietti) 's Twitter Profile Photo

Come hear about how transformers perform factual recall using associative memories, and how this emerges in phases during training! #ICLR2025 poster #602 at 3pm today. Lead by Eshaan Nichani Link: iclr.cc/virtual/2025/p… Paper: arxiv.org/abs/2412.06538

Jason Lee (@jasondeanlee) 's Twitter Profile Photo

Our new work on scaling laws that includes compute, model size, and number of samples. The analysis involves an extremely fine-grained analysis of online sgd built up over the last 8 years of understanding sgd on simple toy models (tensors, single index models, multi index model)

Zixuan Wang (@zzzixuanwang) 's Twitter Profile Photo

LLMs can solve complex tasks that require combining multiple reasoning steps. But when are such capabilities learnable via gradient-based training? In our new COLT 2025 paper, we show that easy-to-hard data is necessary and sufficient! arxiv.org/abs/2505.23683 🧵 below (1/10)

LLMs can solve complex tasks that require combining multiple reasoning steps. But when are such capabilities learnable via gradient-based training?

In our new COLT 2025 paper, we show that easy-to-hard data is necessary and sufficient!

arxiv.org/abs/2505.23683

🧵 below (1/10)
Eshaan Nichani (@eshaannichani) 's Twitter Profile Photo

Transformers can efficiently *express* compositional/multi-step reasoning tasks. But what about *learnability*? Our new paper shows that learnability via GD is computationally hard, unless models are trained on a curriculum/mixture of easy-to-hard data! See🧵below for more!

Jason Lee (@jasondeanlee) 's Twitter Profile Photo

New work arxiv.org/abs/2506.05500 on learning multi-index models with Alex Damian and Joan Bruna. Multi-index are of the form y= g(Ux), where U=r by d maps from d dimension to r dimension and d>>r. g is an arbitrary function. Examples of multi-index models are any neural net