
Kaiyue Wen
@wen_kaiyue
A continuous learner
ID: 1672114677659365378
http://wenkaiyue.com 23-06-2023 05:29:50
78 Tweet
313 Followers
457 Following

Does all LLM reasoning transfer to VLM? In context of Simple-to-Hard generalization we show: NO! We also give ways to reduce this modality imbalance. Paper arxiv.org/abs/2501.02669 Code github.com/princeton-pli/… Abhishek Panigrahi Yun (Catherine) Cheng Dingli Yu Anirudh Goyal Sanjeev Arora









I and Arseniy, I believe, made a step towards properly characterizing how and when the training of Mini-Batch SGD shows Edge of Stability/Break-Even Point (Stanisław Jastrzębski, Jeremy Cohen). Link: arxiv.org/abs/2412.20553





How does the depth of a transformer affect reasoning capabilities? New preprint by myself and Ashish Sabharwal shows that a little depth goes a long way to increase transformers’ expressive power We take this as encouraging for further research on looped transformers!🧵




