Tongzheng Ren (@rtz19970824) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Diffusion is the rising tide that eventually submerges all frequencies, high and low 🌊 Diffusion is the gradual decomposition into feature scales, fine and coarse 🗼 Diffusion is just spectral autoregression 🤷🌈

thumb_up_off_alt1,1K

chat_bubble_outline34

repeat162

shareShare

Yu-Xiang Wang

@yuxiangw_cs

7 months ago

Incredible and thought-provoking talk on the cultural open problems of theory in the age of GPUs — and how junior researchers could adapt.. by the great Matus Telgarsky. Slides: cims.nyu.edu/~matus/neurips…

thumb_up_off_alt96

chat_bubble_outline2

repeat19

shareShare

Constantine Caramanis

@cmcaram

6 months ago

🚀 🇬🇷 A year in the making! I’ve just completed a set of 21 lectures in Machine Learning, in Greek, designed for high school students. The course introduces key ML concepts, coding in Python & PyTorch, and real-world AI applications. #MachineLearning #AI #EdTech #Greece

thumb_up_off_alt87

chat_bubble_outline4

repeat18

shareShare

Yifei Wang

@yifeiwang77

5 months ago

Great to see a reviving interest in long-context LLMs these days (kudos to awesome evals and archs)! But are you training long-context LLMs wisely (to save the huge cost)? In recent #ICLR2025 paper, we show that vanilla next token prediction could be very suboptimal(!!) for

thumb_up_off_alt318

chat_bubble_outline8

repeat47

shareShare

Simon Shaolei Du

@simonshaoleidu

5 months ago

A Minimalist Example of Edge-of-Stability and Progressive Sharpening

thumb_up_off_alt25

chat_bubble_outline0

repeat4

shareShare

Eric Zhao

@ericzhao28

5 months ago

(More news!) I wrote a new blog post on our current understanding of multi-distribution learning (MDL) in 2025. I give a gentle intro, 🌶️ but belated updates to our COLT open problem, and discuss some fundamental unresolved questions. Link ⬇️

thumb_up_off_alt36

chat_bubble_outline1

repeat5

shareShare

William Merrill

@lambdaviking

5 months ago

How does the depth of a transformer affect reasoning capabilities? New preprint by myself and Ashish Sabharwal shows that a little depth goes a long way to increase transformers’ expressive power We take this as encouraging for further research on looped transformers!🧵

How does the depth of a transformer affect reasoning capabilities? New preprint by myself and <a href="/Ashish_S_AI/">Ashish Sabharwal</a> shows that a little depth goes a long way to increase transformers’ expressive power

We take this as encouraging for further research on looped transformers!🧵

thumb_up_off_alt396

chat_bubble_outline11

repeat57

shareShare

Zhuoran Yang

@zhuoran_yang

4 months ago

[New paper on in-context learning] "In-Context Linear Regression Demystified" (link: arxiv.org/abs/2503.12734). Joint work Jianliang He, Xintian Pan, Siyu Chen. We establish a rather complete understanding of how one-layer multi-head attention solves in-context linear regression,

thumb_up_off_alt108

chat_bubble_outline2

repeat24

shareShare

Dylan Foster 🐢

@canondetortugas

4 months ago

Reinforcement learning has led to amazing breakthroughs in reasoning (e.g., R1), but can it discover truly new behaviors not already present in the base model? New paper with Zak Mhammedi and Dhruv Rohatgi: The Computational Role of the Base Model in Exploration thread:

thumb_up_off_alt705

chat_bubble_outline10

repeat108

shareShare

Chi Jin

@chijinml

4 months ago

We're releasing a new Pokémon dataset: 2 million games from Showdown across 37+ competitive formats (Gen 1–9, Elo 1000-1800+), as part of our PokeChamp project! Perfect time to build your own Pokémon bot. 🤖⚔️ Check it out!

thumb_up_off_alt64

chat_bubble_outline3

repeat4

shareShare

Mengdi Wang

@mengdiwang10

4 months ago

🚨 Discover the Science of LLM! We uncover how LLMs (Llama3-70B) achieve abstract reasoning through emergent symbolic mechanisms: 1️⃣ Symbol Abstraction Heads: Early layers convert input tokens into abstract variables based on their relationships. 2️⃣ Symbolic Induction Heads:

thumb_up_off_alt167

chat_bubble_outline4

repeat35

shareShare

Clémentine Dominé 🍊

@clementinedomi6

4 months ago

🚀 Exciting news! Our paper "From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks" has been accepted at ICLR 2025! arxiv.org/abs/2409.14623 A thread on how relative weight initialization shapes learning dynamics in deep networks. 🧵 (1/9)

thumb_up_off_alt236

chat_bubble_outline2

repeat64

shareShare

Taiji Suzuki

@btreetaiji

3 months ago

We are presenting "State Space Models are Provably Comparable to Transformers in Dynamic Token Selection" Sat 26 Apr 10 a.m. — 12:30 p.m. - Capabilities of SSMs combined with FNNs - SSMs are comparable to Transformers in extracting tokens depending on the context

thumb_up_off_alt75

chat_bubble_outline0

repeat16

shareShare

Runtian Zhai

@runtianzhai

3 months ago

Why can foundation models transfer to so many downstream tasks? Will the scaling law end? Will pretraining end like Ilya Sutskever predicted? My PhD thesis builds the contexture theory to answer the above. Blog: runtianzhai.com/thesis Paper: arxiv.org/abs/2504.19792 🧵1/12

thumb_up_off_alt161

chat_bubble_outline2

repeat32

shareShare

Alessandro Favero

@alesfav

3 months ago

Check it out & see you in Vancouver (again)! arxiv.org/abs/2502.12089 Huge thanks to my amazing coauthors Antonio Sclocchi (*) Francesco Cagnetta Pascal Frossard and Matthieu Wyart from EPFL Physics of Complex Systems Lab LTS4 Johns Hopkins University SISSA Gatsby Computational Neuroscience Unit

thumb_up_off_alt9

chat_bubble_outline0

repeat2

shareShare

Simon Shaolei Du

@simonshaoleidu

2 months ago

PPO vs. DPO? 🤔 Our new paper proves that it depends on whether your models can represent the optimal policy and/or reward. Paper: arxiv.org/abs/2505.19770 Led by Ruizhe Shi Minhak Song

thumb_up_off_alt97

chat_bubble_outline0

repeat18

shareShare

Jason Lee

@jasondeanlee

a month ago

Great videos! I learned so much. The assignments are too hard for me.

thumb_up_off_alt425

chat_bubble_outline7

repeat20

shareShare

Csaba Szepesvari

@csabaszepesvari

19 days ago

First position paper I ever wrote. "Beyond Statistical Learning: Exact Learning Is Essential for General Intelligence" arxiv.org/abs/2506.23908 Background: I'd like LLMs to help me do math, but statistical learning seems inadequate to make this happen. What do you all think?

thumb_up_off_alt431

chat_bubble_outline19

repeat67

shareShare

Gokul Swamy

@g_k_swamy

11 days ago

Recent work has seemed somewhat magical: how can RL with *random* rewards make LLMs reason? We pull back the curtain on these claims and find out this unexpected behavior hinges on the inclusion of certain *heuristics* in the RL algorithm. Our blog post: tinyurl.com/heuristics-con…

thumb_up_off_alt477

chat_bubble_outline11

repeat69

shareShare

Tongzheng Ren

Gate.io

Sander Dieleman

Yu-Xiang Wang

Constantine Caramanis

Yifei Wang

Simon Shaolei Du

Eric Zhao

William Merrill

Zhuoran Yang

Dylan Foster 🐢

Chi Jin

Mengdi Wang

Clémentine Dominé 🍊

Taiji Suzuki

Runtian Zhai

Alessandro Favero

Simon Shaolei Du

Jason Lee

Csaba Szepesvari

Gokul Swamy