Kaiyue Wen (@wen_kaiyue) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Does all LLM reasoning transfer to VLM? In context of Simple-to-Hard generalization we show: NO! We also give ways to reduce this modality imbalance. Paper arxiv.org/abs/2501.02669 Code github.com/princeton-pli/… Abhishek Panigrahi Yun (Catherine) Cheng Dingli Yu Anirudh Goyal Sanjeev Arora

thumb_up_off_alt69

chat_bubble_outline1

repeat18

shareShare

Songlin Yang

@songlinyang4

6 months ago

I've created slides for those curious about the recent rapid progress in linear attention: from linear attention to Lightning-Attention, Mamba2, DeltaNet, and TTT/Titans. Check it out here: sustcsonglin.github.io/assets/pdf/tal…

thumb_up_off_alt853

chat_bubble_outline11

repeat168

shareShare

Qwen

@alibaba_qwen

6 months ago

🚀 New Approach to Training MoE Models! We’ve made a key change: switching from micro-batches to global-batches for better load balancing. This simple tweak lets experts specialize more effectively, leading to: ✅ Improved model performance ✅ Better handling of real-world

thumb_up_off_alt792

chat_bubble_outline19

repeat108

shareShare

Tongtian Zhu

@tongtian_zhu

6 months ago

Super excited to share our work on data influence cascade in decentralized learning, just accepted by #ICLR2025! 🎉 Data quality is crucial for LM training. But can we quantify the importance of data in a fully decentralized learning system? 🤔 Here’s a surprising insight: the

thumb_up_off_alt54

chat_bubble_outline2

repeat11

shareShare

Dimitris Papailiopoulos

@dimitrispapail

5 months ago

Transformers can overcome easy-to-hard and length generalization challenges through recursive self-improvement. Paper on arxiv coming on Monday. Link to a talk I gave on this below 👇 Super excited about this work!

thumb_up_off_alt1,1K

chat_bubble_outline19

repeat149

shareShare

Kaiyue Wen

@wen_kaiyue

5 months ago

The first line is exactly one of the research projects I did during my undergrad🤣

thumb_up_off_alt4

chat_bubble_outline1

repeat0

shareShare

Konstantin Mishchenko

@konstmish

5 months ago

Learning rate schedulers used to be a big mistery. Now you can just take a guarantee for *convex non-smooth* problems (from arxiv.org/abs/2310.07831), and they give you *precisely* what you see in training large models. See this empirical study: arxiv.org/abs/2501.18965 1/3

thumb_up_off_alt440

chat_bubble_outline5

repeat75

shareShare

Tengyu Ma

@tengyuma

5 months ago

RL + CoT works great for DeepSeek-R1 & o1, but: 1️⃣ Linear-in-log scaling in train & test-time compute 2️⃣ Likely bounded by difficulty of training problems Meet STP—a self-play algorithm that conjectures & proves indefinitely, scaling better! 🧠⚡🧵🧵 arxiv.org/abs/2502.00212

thumb_up_off_alt556

chat_bubble_outline17

repeat108

shareShare

Pierfrancesco Beneventano

@pierbeneventano

5 months ago

I and Arseniy, I believe, made a step towards properly characterizing how and when the training of Mini-Batch SGD shows Edge of Stability/Break-Even Point (Stanisław Jastrzębski, Jeremy Cohen). Link: arxiv.org/abs/2412.20553

thumb_up_off_alt23

chat_bubble_outline3

repeat7

shareShare

Nikhil Vyas

@vyasnikhil96

5 months ago

Combining SOAP and Muon: nikhilvyas.github.io/SOAP_Muon.pdf and some rough thoughts on interesting future directions.

thumb_up_off_alt186

chat_bubble_outline3

repeat23

shareShare

Shengguang Wu

@shengguangwu

5 months ago

❓Do VLMs really pay attention to image inputs? 😮Shockingly, a VLM is most likely to generate the response below about 𝒶 𝒹𝑜𝑔 when given 𝐧𝐨 𝐢𝐦𝐚𝐠𝐞 𝐚𝐭 𝐚𝐥𝐥—and least likely when shown the correct image. 🏆To tackle this 𝐯𝐢𝐬𝐮𝐚𝐥 𝐧𝐞𝐠𝐥𝐞𝐜𝐭, we introduce a

thumb_up_off_alt185

chat_bubble_outline3

repeat28

shareShare

ManusAI

@manusai_hq

4 months ago

Introducing Manus: the first general AI agent. Try Manus today and see the future of human-machine collaboration: manus.im

thumb_up_off_alt5,5K

chat_bubble_outline827

repeat1,1K

shareShare

William Merrill

@lambdaviking

4 months ago

How does the depth of a transformer affect reasoning capabilities? New preprint by myself and Ashish Sabharwal shows that a little depth goes a long way to increase transformers’ expressive power We take this as encouraging for further research on looped transformers!🧵

How does the depth of a transformer affect reasoning capabilities? New preprint by myself and <a href="/Ashish_S_AI/">Ashish Sabharwal</a> shows that a little depth goes a long way to increase transformers’ expressive power

We take this as encouraging for further research on looped transformers!🧵

thumb_up_off_alt396

chat_bubble_outline11

repeat57

shareShare

Zhiyuan Zeng

@zhiyuanzeng_

4 months ago

Is a single accuracy number all we can get from model evals?🤔 🚨Does NOT tell where the model fails 🚨Does NOT tell how to improve it Introducing EvalTree🌳 🔍identifying LM weaknesses in natural language 🚀weaknesses serve as actionable guidance (paper&demo 🔗in🧵) [1/n]

thumb_up_off_alt240

chat_bubble_outline4

repeat89

shareShare

Christina Baek

@_christinabaek

3 months ago

Are current reasoning models optimal for test-time scaling? 🌠 No! Models make the same incorrect guess over and over again. We show that you can fix this problem w/o any crazy tricks 💫 – just do weight ensembling (WiSE-FT) for big gains on math! 1/N

thumb_up_off_alt478

chat_bubble_outline6

repeat103

shareShare

Yushun Zhang

@ericzhang0410

2 months ago

New paper alert! We report that the Hessian of NNs has a very special structure: 1. it appears to be a "block-diagonal-block-circulant" matrix at initialization; 2. then it quickly evolves into a "near-block-diagonal" matrix along training. We then theoretically reveal two

thumb_up_off_alt397

chat_bubble_outline15

repeat62

shareShare

Percy Liang

@percyliang

2 months ago

What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:

thumb_up_off_alt939

chat_bubble_outline39

repeat185

shareShare

Kaiyue Wen

Gate.io

Simon Park

Songlin Yang

Qwen

Tongtian Zhu

Dimitris Papailiopoulos

Kaiyue Wen

Konstantin Mishchenko

Tengyu Ma

Pierfrancesco Beneventano

Nikhil Vyas

Shengguang Wu

ManusAI

William Merrill

Zhiyuan Zeng

Christina Baek

Yushun Zhang

Percy Liang