Tongzheng Ren (@rtz19970824) 's Twitter Profile
Tongzheng Ren

@rtz19970824

QR @citsecurities. CS PhD @UTCompSci. Previously QR Intern @citsecurities, Student Researcher @GoogleDeepMind. Interested in Applied Math & Prob.

ID: 4647878532

linkhttp://cs.utexas.edu/~tzren calendar_today25-12-2015 07:40:07

187 Tweet

181 Followers

258 Following

Sander Dieleman (@sedielem) 's Twitter Profile Photo

Diffusion is the rising tide that eventually submerges all frequencies, high and low 🌊 Diffusion is the gradual decomposition into feature scales, fine and coarse 🗼 Diffusion is just spectral autoregression 🤷🌈

Yu-Xiang Wang (@yuxiangw_cs) 's Twitter Profile Photo

Incredible and thought-provoking talk on the cultural open problems of theory in the age of GPUs — and how junior researchers could adapt.. by the great Matus Telgarsky. Slides: cims.nyu.edu/~matus/neurips…

Incredible and thought-provoking talk on the cultural open problems of theory in the age of GPUs — and how junior researchers could adapt.. by the great Matus Telgarsky. Slides: cims.nyu.edu/~matus/neurips…
Constantine Caramanis (@cmcaram) 's Twitter Profile Photo

🚀 🇬🇷 A year in the making! I’ve just completed a set of 21 lectures in Machine Learning, in Greek, designed for high school students. The course introduces key ML concepts, coding in Python & PyTorch, and real-world AI applications. #MachineLearning #AI #EdTech #Greece

Yifei Wang (@yifeiwang77) 's Twitter Profile Photo

Great to see a reviving interest in long-context LLMs these days (kudos to awesome evals and archs)! But are you training long-context LLMs wisely (to save the huge cost)? In recent #ICLR2025 paper, we show that vanilla next token prediction could be very suboptimal(!!) for

Great to see a reviving interest in long-context LLMs these days (kudos to awesome evals and archs)! But are you training long-context LLMs wisely (to save the huge cost)?

In recent #ICLR2025 paper, we show that vanilla next token prediction could be very suboptimal(!!)  for
Eric Zhao (@ericzhao28) 's Twitter Profile Photo

(More news!) I wrote a new blog post on our current understanding of multi-distribution learning (MDL) in 2025. I give a gentle intro, 🌶️ but belated updates to our COLT open problem, and discuss some fundamental unresolved questions. Link ⬇️

William Merrill (@lambdaviking) 's Twitter Profile Photo

How does the depth of a transformer affect reasoning capabilities? New preprint by myself and Ashish Sabharwal shows that a little depth goes a long way to increase transformers’ expressive power We take this as encouraging for further research on looped transformers!🧵

How does the depth of a transformer affect reasoning capabilities? New preprint by myself and <a href="/Ashish_S_AI/">Ashish Sabharwal</a> shows that a little depth goes a long way to increase transformers’ expressive power

We take this as encouraging for further research on looped transformers!🧵
Zhuoran Yang (@zhuoran_yang) 's Twitter Profile Photo

[New paper on in-context learning] "In-Context Linear Regression Demystified" (link: arxiv.org/abs/2503.12734). Joint work Jianliang He, Xintian Pan, Siyu Chen. We establish a rather complete understanding of how one-layer multi-head attention solves in-context linear regression,

Dylan Foster 🐢 (@canondetortugas) 's Twitter Profile Photo

Reinforcement learning has led to amazing breakthroughs in reasoning (e.g., R1), but can it discover truly new behaviors not already present in the base model? New paper with Zak Mhammedi and Dhruv Rohatgi: The Computational Role of the Base Model in Exploration thread:

Reinforcement learning has led to amazing breakthroughs in reasoning (e.g., R1), but can it discover truly new behaviors not already present in the base model?  

New paper with Zak Mhammedi and Dhruv Rohatgi:  
The Computational Role of the Base Model in Exploration

thread:
Chi Jin (@chijinml) 's Twitter Profile Photo

We're releasing a new Pokémon dataset: 2 million games from Showdown across 37+ competitive formats (Gen 1–9, Elo 1000-1800+), as part of our PokeChamp project! Perfect time to build your own Pokémon bot. 🤖⚔️ Check it out!

Mengdi Wang (@mengdiwang10) 's Twitter Profile Photo

🚨 Discover the Science of LLM! We uncover how LLMs (Llama3-70B) achieve abstract reasoning through emergent symbolic mechanisms: 1️⃣ Symbol Abstraction Heads: Early layers convert input tokens into abstract variables based on their relationships. 2️⃣ Symbolic Induction Heads:

🚨 Discover the Science of LLM! We uncover how LLMs (Llama3-70B) achieve abstract reasoning through emergent symbolic mechanisms: 

1️⃣ Symbol Abstraction Heads: Early layers convert input tokens into abstract variables based on their relationships. 
2️⃣ Symbolic Induction Heads:
Clémentine Dominé 🍊 (@clementinedomi6) 's Twitter Profile Photo

🚀 Exciting news! Our paper "From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks" has been accepted at ICLR 2025! arxiv.org/abs/2409.14623 A thread on how relative weight initialization shapes learning dynamics in deep networks. 🧵 (1/9)

🚀 Exciting news! Our paper "From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks" has been accepted at ICLR 2025!

arxiv.org/abs/2409.14623

A thread on how relative weight initialization shapes learning dynamics in deep networks. 🧵 (1/9)
Taiji Suzuki (@btreetaiji) 's Twitter Profile Photo

We are presenting "State Space Models are Provably Comparable to Transformers in Dynamic Token Selection" Sat 26 Apr 10 a.m. — 12:30 p.m. - Capabilities of SSMs combined with FNNs - SSMs are comparable to Transformers in extracting tokens depending on the context

We are presenting "State Space Models are Provably Comparable to Transformers in Dynamic Token Selection" Sat 26 Apr 10 a.m. — 12:30 p.m.

- Capabilities of SSMs combined with FNNs
- SSMs are comparable to Transformers in extracting tokens depending on the context
Runtian Zhai (@runtianzhai) 's Twitter Profile Photo

Why can foundation models transfer to so many downstream tasks? Will the scaling law end? Will pretraining end like Ilya Sutskever predicted? My PhD thesis builds the contexture theory to answer the above. Blog: runtianzhai.com/thesis Paper: arxiv.org/abs/2504.19792 🧵1/12

Simon Shaolei Du (@simonshaoleidu) 's Twitter Profile Photo

PPO vs. DPO? 🤔 Our new paper proves that it depends on whether your models can represent the optimal policy and/or reward. Paper: arxiv.org/abs/2505.19770 Led by Ruizhe Shi Minhak Song

Csaba Szepesvari (@csabaszepesvari) 's Twitter Profile Photo

First position paper I ever wrote. "Beyond Statistical Learning: Exact Learning Is Essential for General Intelligence" arxiv.org/abs/2506.23908 Background: I'd like LLMs to help me do math, but statistical learning seems inadequate to make this happen. What do you all think?

Gokul Swamy (@g_k_swamy) 's Twitter Profile Photo

Recent work has seemed somewhat magical: how can RL with *random* rewards make LLMs reason? We pull back the curtain on these claims and find out this unexpected behavior hinges on the inclusion of certain *heuristics* in the RL algorithm. Our blog post: tinyurl.com/heuristics-con…

Recent work has seemed somewhat magical: how can RL with *random* rewards make LLMs reason? We pull back the curtain on these claims and find out this unexpected behavior hinges on the inclusion of certain *heuristics* in the RL algorithm. Our blog post: tinyurl.com/heuristics-con…