Nicholas Lourie (@nicklourie) Twitter Tweets • TwiCopy

Nicholas Lourie

@nicklourie

+ Follow

Better empirical methods for deep learning. PhD at @nyuniversity (@CILVRatNYU). Advised by @kchonyc and @hhexiy. Prev: @allen_ai.

I build things. 🤖

ID: 2370922034

linkhttps://github.com/nicholaslourie/opda calendar_today03-03-2014 20:43:21

33 Tweet

1,1K Followers

1,1K Following

NYU Center for Data Science

@nyudatascience

a year ago

CDS Prof. Kyunghyun Cho (Kyunghyun Cho) has published two new papers, urging a reevaluation of how progress in AI is measured. Are we advancing or just repeating history? Learn more: nyudatascience.medium.com/separating-hyp…

thumb_up_off_alt5

chat_bubble_outline0

repeat2

shareShare

Nicholas Lourie

@nicklourie

a year ago

If you're at #NAACL2024 come say hi at the poster session!! 😁

thumb_up_off_alt12

chat_bubble_outline0

repeat0

shareShare

Jane Pan

@janepan_

a year ago

Do LLMs exploit imperfect proxies of human preference in context? Yes! In fact, they do it so severely that iterative refinement can make outputs worse when judged by actual humans. In other words, reward hacking can occur even without gradient updates! w/ He He,

thumb_up_off_alt171

chat_bubble_outline4

repeat28

shareShare

Siavash Golkar

@siavashgolkar

a year ago

SOTA models often use bidirectional transformers for non-NLP tasks but did you know causal transformers can outperform them even on tasks without a causal structure? Our recent work shows causal transformers learn circuits bidirectional ones can't, leading to better performance!

thumb_up_off_alt42

chat_bubble_outline1

repeat11

shareShare

Mayee Chen

@mayeechen

10 months ago

There are many algorithms for constructing pre-training data mixtures—which one should we use? Turns out: many of them fall under one framework, have similar issues, and can be improved with a straightforward modification. Introducing Aioli! 🧄 1/9

thumb_up_off_alt184

chat_bubble_outline2

repeat53

shareShare

Michael Hu

@michahu8

10 months ago

So you want a good pretraining data mix🧑‍🍳, but which data mixing algorithm do you pick? DoGE, DoReMi, Skill-it, grid searching proportions… 😵‍💫 It turns out that these algorithms are all special cases of Linear Mixing Optimization (LMO), our new data mixing framework! 🧵

thumb_up_off_alt70

chat_bubble_outline2

repeat13

shareShare

Nicholas Lourie

@nicklourie

10 months ago

If scaling no longer makes economic sense, what does that mean for research?? Will we see more work on architecture and fundamentals again? Or, will the current spread of topics remain unchanged? 🤔

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

alphaXiv

@askalphaxiv

10 months ago

Finding good data mixtures for LLM training can be tricky - Aioli provides a unified framework to construct pre-training data mixtures. Talk to the authors Mayee Chen Michael Hu Nicholas Lourie Kyunghyun Cho Chris Re hazyresearch directly here!

thumb_up_off_alt16

chat_bubble_outline1

repeat6

shareShare

Anthropic

@anthropicai

10 months ago

New Anthropic research: Adding Error Bars to Evals. AI model evaluations don’t usually include statistics or uncertainty. We think they should. Read the blog post here: anthropic.com/research/stati…

thumb_up_off_alt2,2K

chat_bubble_outline51

repeat309

shareShare

Nicholas Lourie

@nicklourie

10 months ago

I missed this great paper when it came out last year! TL;DR: Prediction errors from models trained with different random seeds become independent as training converges---at least for the image classification tasks they consider. This finding has important implications if you're

thumb_up_off_alt8

chat_bubble_outline0

repeat0

shareShare

Nicholas Lourie

@nicklourie

10 months ago

Anthropic put out a great primer on statistical methods for LLM evals by Evan Miller. Check out his blog too! He's written gems on A/B testing and other topics---just make sure you don't mind losing an afternoon like I did when I first came across it! 😆 evanmiller.org

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Charlie Snell

@sea_snell

9 months ago

Can we predict emergent capabilities in GPT-N+1🌌 using only GPT-N model checkpoints, which have random performance on the task? We propose a method for doing exactly this in our paper “Predicting Emergent Capabilities by Finetuning”🧵

thumb_up_off_alt570

chat_bubble_outline12

repeat70

shareShare

Nicholas Lourie

@nicklourie

9 months ago

A great idea by Charlie Snell: Use finetuning to predict where zero-shot capabilities emerge. This lets you experiment at a smaller scale. The more finetuning data you have, the smaller of a model you can use. Here's how I think about it: a one-time cost collecting data saves you

thumb_up_off_alt12

chat_bubble_outline0

repeat1

shareShare

Michael Hu

@michahu8

2 months ago

📢 today's scaling laws often don't work for predicting downstream task performance. For some pretraining setups, smooth and predictable scaling is the exception, not the rule. a quick read about scaling law fails: 📜arxiv.org/abs/2507.00885 🧵1/5👇

thumb_up_off_alt279

chat_bubble_outline4

repeat36

shareShare