Nan Jiang (@nanjiang_cs) 's Twitter Profile
Nan Jiang

@nanjiang_cs

machine learning researcher, with focus on reinforcement learning. assoc prof @ uiuc cs. Course on RL theory (w/ videos): nanjiang.cs.illinois.edu/cs542

ID: 925800751628279808

linkhttp://nanjiang.cs.illinois.edu/ calendar_today01-11-2017 19:04:34

2,2K Tweet

8,8K Followers

72 Following

Nan Jiang (@nanjiang_cs) 's Twitter Profile Photo

will miss ICLR but check out Yuheng’s works! One on POMDPs showing statistical separation between model-based and model-free OPE, and one on iterative nash PO for LLMs. (unfortunately he’s not going either given recent travel advisory despite the oral…)

will miss ICLR but check out Yuheng’s works! One on POMDPs showing statistical separation between model-based and model-free OPE, and one on iterative nash PO for LLMs.

(unfortunately he’s not going either given recent travel advisory despite the oral…)
Dylan Foster 🐢 (@canondetortugas) 's Twitter Profile Photo

Akshay presenting InferenceTimePessimism, a new alternative to BoN sampling for scaling test-time compute. From our recent paper here: arxiv.org/abs/2503.21878

Dylan Foster 🐢 (@canondetortugas) 's Twitter Profile Photo

Is Best-of-N really the best we can do for language model inference?   New algo & paper: 🚨InferenceTimePessimism🚨 Led by the amazing Audrey Huang (Audrey Huang) with Adam Block, Qinghua Liu, Nan Jiang (Nan Jiang), and Akshay Krishnamurthy. Appearing at ICML '25. 1/11

Is Best-of-N really the best we can do for language model inference?  

New algo & paper: 🚨InferenceTimePessimism🚨

Led by the amazing Audrey Huang (<a href="/auddery/">Audrey Huang</a>) with Adam Block, Qinghua Liu, Nan Jiang (<a href="/nanjiang_cs/">Nan Jiang</a>), and Akshay Krishnamurthy. Appearing at ICML '25.

1/11
Dylan Foster 🐢 (@canondetortugas) 's Twitter Profile Photo

RL and post-training play a central role in giving language models advanced reasoning capabilities, but many algorithmic and scientific questions remain unanswered. Join us at FoPT @ COLT '25 to explore pressing emerging challenges and opportunities for theory to bring clarity.

Yu-Xiang Wang (@yuxiangw_cs) 's Twitter Profile Photo

Peyman Milanfar NSF is the only funding agency that is evaluating projects based on sole merit-based reviews. Your experience is a valid data point, but for most junior faculty who start off without established industry/defense agency connections, NSF is about their only bet to get started.

Dylan Foster 🐢 (@canondetortugas) 's Twitter Profile Photo

x.com/Nived_Rajarama… Quick reminder post-NeurIPS: The deadline for our workshop on Foundations of Post-Training (FoPT) at COLT 2025 is coming up this Monday, May 19!

Nan Jiang (@nanjiang_cs) 's Twitter Profile Photo

sample size is extended to 400 (at least for AC) and gosh, does that improve the matching quality. at one point I thought no one is doing RL that involves even a touch of theory...

Sam Power (@sp_monte_carlo) 's Twitter Profile Photo

In the interim, I wanted to advertise our YouTube channel - youtube.com/@montecarlosem… - which contains recordings for the bulk of our talks so far (sites.google.com/view/monte-car…, sites.google.com/view/monte-car…). I encourage you to catch up and enjoy them over the intervening months!

Nan Jiang (@nanjiang_cs) 's Twitter Profile Photo

It’s almost hilarious that he chose to mention US space program. Who tells him what happened when a founding figure of JPL was expelled for the exact same reasons they use today?

Nan Jiang (@nanjiang_cs) 's Twitter Profile Photo

Given the sheer number of ppl interested in PG methods nowadays I'm sure innocent "rediscoveries" like this are happening everyday. Otoh, due diligence takes minimal effort today as you can just DeepResearch. All it takes is the sense/taste to ask "no way this is not done b4"...

Nan Jiang (@nanjiang_cs) 's Twitter Profile Photo

I've received multiple emails from nxtai-conference.com. Having "AI+quantum" as keywords looks very much like a scam at first glance (sorry!) but the speaker list seems very legit. What's going on with this...? Also can't find organizer info. Are there academics behind this?

Nan Jiang (@nanjiang_cs) 's Twitter Profile Photo

Re error propagation: if you believe model-based is a solution but also want the benefits of model-free, perhaps time to investigate (never thoroughly-studied) bellman-error minimization... BRM is, in a way, closer to model-based than TD (small revelation from my l4dc talk)

Re error propagation: if you believe model-based is a solution but also want the benefits of model-free, perhaps time to investigate (never thoroughly-studied) bellman-error minimization... 

BRM is, in a way, closer to model-based than TD (small revelation from my l4dc talk)
Allen Nie (🇺🇦☮️) (@allen_a_nie) 's Twitter Profile Photo

Decision-making with LLM can be studied with RL! Can an agent solve a task with text feedback (OS terminal, compiler, a person) efficiently? How can we understand the difficulty? We propose a new notion of learning complexity to study learning with language feedback only. 🧵👇

Decision-making with LLM can be studied with RL! Can an agent solve a task with text feedback (OS terminal, compiler, a person) efficiently? How can we understand the difficulty? We propose a new notion of learning complexity to study learning with language feedback only. 🧵👇
Eugene Vinitsky 🍒🦋 (@eugenevinitsky) 's Twitter Profile Photo

We now know RL agents can zero-shot crush driving benchmarks. Can we put them on a car and replace the planning stack? We're hiring a postdoc at NYU to find out! Email me if interested and please help us get the word out.

TalkRL Podcast (@talkrlpodcast) 's Twitter Profile Photo

E66: Satinder Singh: The Origin Story of RLDM @ RLDM 2025 Professor Satinder Singh of Google DeepMind and University of Michigan is co-founder of RLDM. Here he narrates the origin story of the Reinforcement Learning and Decision Making meeting (not conference).

E66: Satinder Singh: The Origin Story of RLDM @ RLDM 2025
Professor Satinder Singh of <a href="/GoogleDeepMind/">Google DeepMind</a>  and <a href="/UMich/">University of Michigan</a> is co-founder of <a href="/RLDMDublin2025/">RLDM</a>.  Here he narrates the origin story of the Reinforcement Learning and Decision Making meeting (not conference).
Allen Nie (🇺🇦☮️) (@allen_a_nie) 's Twitter Profile Photo

HUGE congrats to Wanqiao Xu -- this paper just got the best theory paper award at ICML 2025 EXAIT (Exploration in AI) -- proposing a new provably efficient exploration algorithm 🛣️ with the right level of abstraction to leverage the strengths of LLMs 💭.

HUGE congrats to <a href="/wanqiao_xu/">Wanqiao Xu</a> -- this paper just got the best theory paper award at ICML 2025 EXAIT (Exploration in AI) -- proposing a new provably efficient exploration algorithm 🛣️ with the right level of abstraction to leverage the strengths of LLMs 💭.