Dingli Yu (@dingli_yu) 's Twitter Profile
Dingli Yu

@dingli_yu

Researcher @ Microsoft Research | PhD from Princeton

ID: 1039536421550387200

linkhttp://dingliyu.net calendar_today11-09-2018 15:29:31

20 Tweet

456 Followers

74 Following

Sanjeev Arora (@prfsanjeevarora) 's Twitter Profile Photo

Conventional wisdom: "Not enough data? Use classic learners (Random Forests, RBF SVM, ..), not deep nets." New paper: infinitely wide nets beat these and also beat finite nets. Infinite nets train faster than finite nets here (hint: Neural Tangent Kernel)! arxiv.org/abs/1910.01663

Sanjeev Arora (@prfsanjeevarora) 's Twitter Profile Photo

Matching Alexnet performance (89%) on CIFAR10 using kernel method. Excluding deep nets, previous best was 86% (Mairal NIPS'16). Key Ideas: convolutional NTK + Coates-Ng random patches layer + way to fold data augmentation into kernel defn arxiv.org/abs/1911.00809

Sanjeev Arora (@prfsanjeevarora) 's Twitter Profile Photo

Fine tuned LLMs can solve many NLP tasks. A priori, fine-tuning a huge LM on a few datapoints could lead to catastrophic overfitting. So why doesn’t it? Our theory + experiments (on GLUE) reveal that fine-tuning is often well-approximated as simple kernel-based learning. 1/2

Sadhika Malladi (@sadhikamalladi) 's Twitter Profile Photo

Why can we fine-tune (FT) huge LMs on a few data points without overfitting? We show with theory + exps that FT can be described by kernel dynamics. arxiv.org/abs/2210.05643 Joint work with Alex Wettig, Dingli Yu, Danqi Chen, Sanjeev Arora. [1/8]

Dingli Yu (@dingli_yu) 's Twitter Profile Photo

Introducing Depth-µP, a depthwise scaling strategy that allows scaling up nets to infinite depth, and provides hyperparameter transfer! Very glad to work w/ Greg Yang Chen Zhu Soufiane Hayou! Link: arxiv.org/abs/2310.02244

Sanjeev Arora (@prfsanjeevarora) 's Twitter Profile Photo

Launching blog Princeton PLI with a post on skillmix. LLMs aren't just "stochastic parrots." Geoffrey Hinton recently mentioned this as evidence that LLMs do "understand" the world a fair bit. More blog posts on the way! (Hinton's post here: x.com/geoffreyhinton…)

Dingli Yu (@dingli_yu) 's Twitter Profile Photo

Safer practice for tuning chatbots: fine-tune without the safety prompt and inference with it! Works surprisingly well in practical settings — fine-tuning on benign dataset to improve downstream tasks while keeping it safe.

Peter Lee (@peteratmsr) 's Twitter Profile Photo

🚀 Phi-4 is here! A small language model that performs as well as (and often better than) large models on certain types of complex reasoning tasks such as math. Useful for us in Microsoft Research, and available now for all researcher on the Azure AI Foundry! aka.ms/phi4blog

🚀 Phi-4 is here! A small language model that performs as well as (and often better than) large models on certain types of complex reasoning tasks such as math. Useful for us in <a href="/MSFTResearch/">Microsoft Research</a>, and available now for all researcher on the Azure AI Foundry! aka.ms/phi4blog
Sebastien Bubeck (@sebastienbubeck) 's Twitter Profile Photo

Surprise #NeurIPS2024 drop for y'all: phi-4 available open weights and with amazing results!!! Tl;dr: phi-4 is in Llama 3.3-70B category (win some lose some) with 5x fewer parameters, and notably outperforms on pure reasoning like GPQA (56%) and MATH (80%).

Surprise #NeurIPS2024 drop for y'all: phi-4 available open weights and with amazing results!!!

Tl;dr: phi-4 is in Llama 3.3-70B category (win some lose some) with 5x fewer parameters, and notably outperforms on pure reasoning like GPQA (56%) and MATH (80%).
Sanjeev Arora (@prfsanjeevarora) 's Twitter Profile Photo

Quanta Magazine featured our work on emergence of skill compositionality (and its limitations) in LLMs among the CS breakthroughs of the year. tinyurl.com/5f5jvzy5. Work was done over 2023 Google DeepMind and Princeton PLI. Key pieces: (i) mathematical framework for

Shengjia Zhao (@shengjia_zhao) 's Twitter Profile Photo

Excited to train o3-mini with Hongyu Ren Kevin Lu and others, a blindingly fast model with amazing reasoning / code / math performance. openai.com/12-days/?day=12

Excited to train o3-mini with <a href="/ren_hongyu/">Hongyu Ren</a> <a href="/_kevinlu/">Kevin Lu</a> and others, a blindingly fast model with amazing reasoning / code / math performance.

openai.com/12-days/?day=12
Simon Park (@parksimon0808) 's Twitter Profile Photo

Does all LLM reasoning transfer to VLM? In context of Simple-to-Hard generalization we show: NO! We also give ways to reduce this modality imbalance. Paper arxiv.org/abs/2501.02669 Code github.com/princeton-pli/… Abhishek Panigrahi Yun (Catherine) Cheng Dingli Yu Anirudh Goyal Sanjeev Arora

Does all LLM reasoning transfer to VLM? In context of Simple-to-Hard generalization we show: NO! We also give ways to reduce this modality imbalance.

Paper arxiv.org/abs/2501.02669
Code github.com/princeton-pli/…

<a href="/Abhishek_034/">Abhishek Panigrahi</a> <a href="/chengyun01/">Yun (Catherine) Cheng</a> <a href="/dingli_yu/">Dingli Yu</a> <a href="/anirudhg9119/">Anirudh Goyal</a> <a href="/prfsanjeevarora/">Sanjeev Arora</a>