Dingli Yu (@dingli_yu) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Conventional wisdom: "Not enough data? Use classic learners (Random Forests, RBF SVM, ..), not deep nets." New paper: infinitely wide nets beat these and also beat finite nets. Infinite nets train faster than finite nets here (hint: Neural Tangent Kernel)! arxiv.org/abs/1910.01663

thumb_up_off_alt803

chat_bubble_outline10

repeat196

shareShare

Sanjeev Arora

@prfsanjeevarora

6 years ago

My student Dingli Yu forwarded link to the code for using NTKs in small-data tasks. Hope you find it useful.

thumb_up_off_alt50

chat_bubble_outline0

repeat8

shareShare

Sanjeev Arora

@prfsanjeevarora

6 years ago

Matching Alexnet performance (89%) on CIFAR10 using kernel method. Excluding deep nets, previous best was 86% (Mairal NIPS'16). Key Ideas: convolutional NTK + Coates-Ng random patches layer + way to fold data augmentation into kernel defn arxiv.org/abs/1911.00809

thumb_up_off_alt224

chat_bubble_outline0

repeat46

shareShare

Sanjeev Arora

@prfsanjeevarora

3 years ago

Fine tuned LLMs can solve many NLP tasks. A priori, fine-tuning a huge LM on a few datapoints could lead to catastrophic overfitting. So why doesn’t it? Our theory + experiments (on GLUE) reveal that fine-tuning is often well-approximated as simple kernel-based learning. 1/2

thumb_up_off_alt231

chat_bubble_outline2

repeat32

shareShare

Sadhika Malladi

@sadhikamalladi

3 years ago

Why can we fine-tune (FT) huge LMs on a few data points without overfitting? We show with theory + exps that FT can be described by kernel dynamics. arxiv.org/abs/2210.05643 Joint work with Alex Wettig, Dingli Yu, Danqi Chen, Sanjeev Arora. [1/8]

thumb_up_off_alt46

chat_bubble_outline2

repeat11

shareShare

Dingli Yu

@dingli_yu

2 years ago

Introducing Depth-µP, a depthwise scaling strategy that allows scaling up nets to infinite depth, and provides hyperparameter transfer! Very glad to work w/ Greg Yang Chen Zhu Soufiane Hayou! Link: arxiv.org/abs/2310.02244

thumb_up_off_alt23

chat_bubble_outline1

repeat4

shareShare

Sanjeev Arora

@prfsanjeevarora

2 years ago

Launching blog Princeton PLI with a post on skillmix. LLMs aren't just "stochastic parrots." Geoffrey Hinton recently mentioned this as evidence that LLMs do "understand" the world a fair bit. More blog posts on the way! (Hinton's post here: x.com/geoffreyhinton…)

thumb_up_off_alt71

chat_bubble_outline3

repeat15

shareShare

Dingli Yu

@dingli_yu

a year ago

Safer practice for tuning chatbots: fine-tune without the safety prompt and inference with it! Works surprisingly well in practical settings — fine-tuning on benign dataset to improve downstream tasks while keeping it safe.

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Peter Lee

@peteratmsr

7 months ago

🚀 Phi-4 is here! A small language model that performs as well as (and often better than) large models on certain types of complex reasoning tasks such as math. Useful for us in Microsoft Research, and available now for all researcher on the Azure AI Foundry! aka.ms/phi4blog

thumb_up_off_alt744

chat_bubble_outline42

repeat182

shareShare

Sebastien Bubeck

@sebastienbubeck

7 months ago

Surprise #NeurIPS2024 drop for y'all: phi-4 available open weights and with amazing results!!! Tl;dr: phi-4 is in Llama 3.3-70B category (win some lose some) with 5x fewer parameters, and notably outperforms on pure reasoning like GPQA (56%) and MATH (80%).

thumb_up_off_alt414

chat_bubble_outline19

repeat71

shareShare

Sanjeev Arora

@prfsanjeevarora

7 months ago

Quanta Magazine featured our work on emergence of skill compositionality (and its limitations) in LLMs among the CS breakthroughs of the year. tinyurl.com/5f5jvzy5. Work was done over 2023 Google DeepMind and Princeton PLI. Key pieces: (i) mathematical framework for

thumb_up_off_alt37

chat_bubble_outline1

repeat8

shareShare

Shengjia Zhao

@shengjia_zhao

7 months ago

Excited to train o3-mini with Hongyu Ren Kevin Lu and others, a blindingly fast model with amazing reasoning / code / math performance. openai.com/12-days/?day=12

Excited to train o3-mini with <a href="/ren_hongyu/">Hongyu Ren</a> <a href="/_kevinlu/">Kevin Lu</a> and others, a blindingly fast model with amazing reasoning / code / math performance.

openai.com/12-days/?day=12

thumb_up_off_alt420

chat_bubble_outline11

repeat43

shareShare

Simon Park

@parksimon0808

6 months ago

Does all LLM reasoning transfer to VLM? In context of Simple-to-Hard generalization we show: NO! We also give ways to reduce this modality imbalance. Paper arxiv.org/abs/2501.02669 Code github.com/princeton-pli/… Abhishek Panigrahi Yun (Catherine) Cheng Dingli Yu Anirudh Goyal Sanjeev Arora

thumb_up_off_alt69

chat_bubble_outline1

repeat18

shareShare