Nan Jiang (@nanjiang_cs) Twitter Tweets • TwiCopy

Nan Jiang

6 months ago

will miss ICLR but check out Yuheng’s works! One on POMDPs showing statistical separation between model-based and model-free OPE, and one on iterative nash PO for LLMs. (unfortunately he’s not going either given recent travel advisory despite the oral…)

thumb_up_off_alt31

chat_bubble_outline0

repeat2

shareShare

Dylan Foster 🐢

@canondetortugas

6 months ago

Akshay presenting InferenceTimePessimism, a new alternative to BoN sampling for scaling test-time compute. From our recent paper here: arxiv.org/abs/2503.21878

thumb_up_off_alt67

chat_bubble_outline2

repeat8

shareShare

Dylan Foster 🐢

@canondetortugas

6 months ago

Is Best-of-N really the best we can do for language model inference? New algo & paper: 🚨InferenceTimePessimism🚨 Led by the amazing Audrey Huang (Audrey Huang) with Adam Block, Qinghua Liu, Nan Jiang (Nan Jiang), and Akshay Krishnamurthy. Appearing at ICML '25. 1/11

Is Best-of-N really the best we can do for language model inference?

New algo & paper: 🚨InferenceTimePessimism🚨

Led by the amazing Audrey Huang (<a href="/auddery/">Audrey Huang</a>) with Adam Block, Qinghua Liu, Nan Jiang (<a href="/nanjiang_cs/">Nan Jiang</a>), and Akshay Krishnamurthy. Appearing at ICML '25.

1/11

thumb_up_off_alt192

chat_bubble_outline2

repeat24

shareShare

Dylan Foster 🐢

@canondetortugas

6 months ago

RL and post-training play a central role in giving language models advanced reasoning capabilities, but many algorithmic and scientific questions remain unanswered. Join us at FoPT @ COLT '25 to explore pressing emerging challenges and opportunities for theory to bring clarity.

thumb_up_off_alt64

chat_bubble_outline1

repeat7

shareShare

Yu-Xiang Wang

@yuxiangw_cs

6 months ago

Peyman Milanfar NSF is the only funding agency that is evaluating projects based on sole merit-based reviews. Your experience is a valid data point, but for most junior faculty who start off without established industry/defense agency connections, NSF is about their only bet to get started.

thumb_up_off_alt33

chat_bubble_outline2

repeat4

shareShare

Nan Jiang

@nanjiang_cs

5 months ago

for papers close to final version, I can only spot typos/minor issues if it's uploaded to openreview 🙃

thumb_up_off_alt7

chat_bubble_outline0

repeat0

shareShare

Dylan Foster 🐢

@canondetortugas

5 months ago

x.com/Nived_Rajarama… Quick reminder post-NeurIPS: The deadline for our workshop on Foundations of Post-Training (FoPT) at COLT 2025 is coming up this Monday, May 19!

thumb_up_off_alt21

chat_bubble_outline0

repeat5

shareShare

Nan Jiang

@nanjiang_cs

5 months ago

sample size is extended to 400 (at least for AC) and gosh, does that improve the matching quality. at one point I thought no one is doing RL that involves even a touch of theory...

thumb_up_off_alt13

chat_bubble_outline0

repeat0

shareShare

Sam Power

@sp_monte_carlo

5 months ago

In the interim, I wanted to advertise our YouTube channel - youtube.com/@montecarlosem… - which contains recordings for the bulk of our talks so far (sites.google.com/view/monte-car…, sites.google.com/view/monte-car…). I encourage you to catch up and enjoy them over the intervening months!

thumb_up_off_alt32

chat_bubble_outline0

repeat5

shareShare

Nan Jiang

@nanjiang_cs

5 months ago

It’s almost hilarious that he chose to mention US space program. Who tells him what happened when a founding figure of JPL was expelled for the exact same reasons they use today?

thumb_up_off_alt8

chat_bubble_outline0

repeat0

shareShare

Nan Jiang

@nanjiang_cs

5 months ago

Thank goodness at least someone remembers nytimes.com/2025/05/30/opi…

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Nan Jiang

@nanjiang_cs

5 months ago

Given the sheer number of ppl interested in PG methods nowadays I'm sure innocent "rediscoveries" like this are happening everyday. Otoh, due diligence takes minimal effort today as you can just DeepResearch. All it takes is the sense/taste to ask "no way this is not done b4"...

thumb_up_off_alt33

chat_bubble_outline0

repeat4

shareShare

Nan Jiang

@nanjiang_cs

5 months ago

En route to (my first) l4dc and will be giving a keynote on Friday. Happy to chat tmr & Friday if you are around!

thumb_up_off_alt24

chat_bubble_outline1

repeat5

shareShare

Nan Jiang

@nanjiang_cs

5 months ago

I've received multiple emails from nxtai-conference.com. Having "AI+quantum" as keywords looks very much like a scam at first glance (sorry!) but the speaker list seems very legit. What's going on with this...? Also can't find organizer info. Are there academics behind this?

thumb_up_off_alt20

chat_bubble_outline5

repeat1

shareShare

Nan Jiang

@nanjiang_cs

4 months ago

Re error propagation: if you believe model-based is a solution but also want the benefits of model-free, perhaps time to investigate (never thoroughly-studied) bellman-error minimization... BRM is, in a way, closer to model-based than TD (small revelation from my l4dc talk)

thumb_up_off_alt221

chat_bubble_outline3

repeat25

shareShare

Allen Nie (🇺🇦☮️)

@allen_a_nie

4 months ago

Decision-making with LLM can be studied with RL! Can an agent solve a task with text feedback (OS terminal, compiler, a person) efficiently? How can we understand the difficulty? We propose a new notion of learning complexity to study learning with language feedback only. 🧵👇

thumb_up_off_alt79

chat_bubble_outline2

repeat16

shareShare

Eugene Vinitsky 🍒🦋

@eugenevinitsky

4 months ago

We now know RL agents can zero-shot crush driving benchmarks. Can we put them on a car and replace the planning stack? We're hiring a postdoc at NYU to find out! Email me if interested and please help us get the word out.

thumb_up_off_alt185

chat_bubble_outline5

repeat30

shareShare

John Langford

@johnclangford

4 months ago

A new opening for multimodal model research: jobs.careers.microsoft.com/global/en/job/… . Please apply if interested.

thumb_up_off_alt60

chat_bubble_outline2

repeat10

shareShare

TalkRL Podcast

@talkrlpodcast

4 months ago

E66: Satinder Singh: The Origin Story of RLDM @ RLDM 2025 Professor Satinder Singh of Google DeepMind and University of Michigan is co-founder of RLDM. Here he narrates the origin story of the Reinforcement Learning and Decision Making meeting (not conference).

E66: Satinder Singh: The Origin Story of RLDM @ RLDM 2025
Professor Satinder Singh of <a href="/GoogleDeepMind/">Google DeepMind</a> and <a href="/UMich/">University of Michigan</a> is co-founder of <a href="/RLDMDublin2025/">RLDM</a>. Here he narrates the origin story of the Reinforcement Learning and Decision Making meeting (not conference).

thumb_up_off_alt25

chat_bubble_outline1

repeat7

shareShare

Allen Nie (🇺🇦☮️)

@allen_a_nie

4 months ago

HUGE congrats to Wanqiao Xu -- this paper just got the best theory paper award at ICML 2025 EXAIT (Exploration in AI) -- proposing a new provably efficient exploration algorithm 🛣️ with the right level of abstraction to leverage the strengths of LLMs 💭.

HUGE congrats to <a href="/wanqiao_xu/">Wanqiao Xu</a> -- this paper just got the best theory paper award at ICML 2025 EXAIT (Exploration in AI) -- proposing a new provably efficient exploration algorithm 🛣️ with the right level of abstraction to leverage the strengths of LLMs 💭.

thumb_up_off_alt36

chat_bubble_outline1

repeat6

shareShare