Jason Lee (@jasondeanlee) Twitter Tweets • TwiCopy

Jason Lee

@jasondeanlee

+ Follow

Associate Professor at Princeton. Former Research Scientist at Google DeepMind. ML/AI Researcher working on foundations of LLMs and deep learning

ID: 1055883888159932416

linkhttp://jasondlee88.github.io calendar_today26-10-2018 18:08:31

1,1K Tweet

13,13K Followers

3,3K Following

Aryeh Kontorovich

@aryehazan

21 days ago

On the Statistical Query Complexity of Learning Semiautomata: a Random Walk Approach George Giapitzakis, Kimon Fountoulakis, Eshaan Nichani, Jason D. Lee Jason Lee arxiv.org/abs/2510.04115…

thumb_up_off_alt13

chat_bubble_outline1

repeat4

shareShare

I am hiring one PhD student. Subject: Reasoning and AI, with a focus on computational learning for long reasoning processes such as automated theorem proving and the learnability of algorithmic tasks. Preferred background: A mathematics student interested in transitioning to

thumb_up_off_alt520

chat_bubble_outline10

repeat99

shareShare

Jeremy Cohen

@deepcohen

20 days ago

Watch Alex Damian give a talk about this paper here: youtube.com/watch?v=04E8r7…

Watch <a href="/alex_damian_/">Alex Damian</a> give a talk about this paper here:
youtube.com/watch?v=04E8r7…

thumb_up_off_alt186

chat_bubble_outline0

repeat19

shareShare

Jason Lee

@jasondeanlee

17 days ago

1.5B not enough for a Gulfstream g800 +maintenance but 3.5B is!

thumb_up_off_alt14

chat_bubble_outline1

repeat0

shareShare

david

@davidtsong

17 days ago

apparently the smart way to recruit SW engineers is to value high school brand > college average Lynbrook alumni engineer 10x > average Stanford engineer

thumb_up_off_alt651

chat_bubble_outline84

repeat13

shareShare

Jason Lee

@jasondeanlee

17 days ago

Must be that 1.5b you can't afford maintenance of a g800. With 3.5B, can afford the g800!

thumb_up_off_alt7

chat_bubble_outline1

repeat0

shareShare

Sham Kakade

@shamkakade6

17 days ago

1/8 Second Order Optimizers like SOAP and Muon have shown impressive performance on LLM optimization. But are we fully utilizing the potential of second order information? New work: we show that a full second order optimizer is much better than existing optimizers in terms of

thumb_up_off_alt586

chat_bubble_outline26

repeat79

shareShare

Jason Lee

@jasondeanlee

16 days ago

What if you are both from lynbrook and Stanford?

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Konstantin Mishchenko

@konstmish

16 days ago

Weight decay changes the training objective because the decay update can be conflicting the gradient update, so the equilibrium is no longer where the gradient is zero. This paper proposes a single-line edit that applies weight decay in a way that preserves the stationary points.

thumb_up_off_alt222

chat_bubble_outline5

repeat20

shareShare

UC Berkeley EECS

@berkeley_eecs

16 days ago

UC Berkeley EECS is hiring! We're seeking exceptional faculty candidates at all ranks for our "Engineering + AI" search and up to 7 tenure-track Asst. Professors in EECS. EECS Focused Searches Include: Quantum Computing ⚛️ AI, Inequality, & Society ⚖️ bit.ly/40L2bwA

thumb_up_off_alt161

chat_bubble_outline1

repeat37

shareShare

Damek

@damekdavis

15 days ago

In this note w/ Ben Recht we look at RL problems with 0/1 rewards, showing that popular methods maximize the average (transformed) probability of correctly answering a prompt x: max_θ 𝔼ₓ h(Prob(correct ∣ x; θ)) for certain functions h. Weirdly, h is arcsin(√t) in GRPO.

In this note w/ <a href="/beenwrekt/">Ben Recht</a> we look at RL problems with 0/1 rewards, showing that popular methods maximize the average (transformed) probability of correctly answering a prompt x:

max_θ 𝔼ₓ h(Prob(correct ∣ x; θ))

for certain functions h. Weirdly, h is arcsin(√t) in GRPO.

thumb_up_off_alt356

chat_bubble_outline9

repeat39

shareShare

Jason Lee

@jasondeanlee

13 days ago

Gpt searched for existing solutions in the literature. It did not solve them itself.

thumb_up_off_alt143

chat_bubble_outline3

repeat4

shareShare

Jason Lee

@jasondeanlee

13 days ago

A Chinese-American hero that inspired my parents' generation to study and stay in America.

thumb_up_off_alt209

chat_bubble_outline2

repeat10

shareShare

Sham Kakade

@shamkakade6

12 days ago

1/6 Introducing Seesaw: a principled batch size scheduling algo. Seesaw achieves theoretically optimal serial run time given a fixed compute budget and also matches the performance of cosine annealing at fixed batch size.

thumb_up_off_alt241

chat_bubble_outline2

repeat34

shareShare

Damek

@damekdavis

12 days ago

a fun exercise for the autoformalization companies: > formalize a "gradient descent on neural networks learns xyz" style paper they are often ~100 pages of algebra, concentration, inequalities, and optimization. Beyond a few grad students, I'm not sure anyone has verified one.

thumb_up_off_alt84

chat_bubble_outline4

repeat4

shareShare

Damek

@damekdavis

12 days ago

Rota i would actually be curious for someone to formalize the tensor programs papers by Greg Yang. I could't quite get to a crisp statement of the results in those works. it would be nice to know what they are saying beyond the "folklore" explanation of feature learning people parrot.

thumb_up_off_alt12

chat_bubble_outline1

repeat1

shareShare

Zhuoran Yang

@zhuoran_yang

11 days ago

Imagine a research paradigm where nascent ideas evolve into fully realized papers, complete with empirical data, insightful figures, and robust citations, through an iterative, feedback-driven autonomous system. This vision guides our work. We introduce **freephdlabor**: a

thumb_up_off_alt32

chat_bubble_outline3

repeat11

shareShare

Ravid Shwartz Ziv

@ziv_ravid

10 days ago

Repeat after me: LLMs are not humans Is RL like how humans learn? No! Is SFT like how humans learn? No! Is the next token predication like humans? No! Is the next big thing in AI will be like humans? No! Does it matter? No! Thank you for your attention to this matter!

thumb_up_off_alt49

chat_bubble_outline9

repeat5

shareShare

Ernest Ryu

@ernestryu

9 days ago

I used ChatGPT to solve an open problem in convex optimization. *Part I* (1/N)

thumb_up_off_alt1,1K

chat_bubble_outline44

repeat168

shareShare