
Ben Lipkin
@ben_lipkin
phd @mit. cogsci, probml, nlp. he/him.
ID: 565036478
http://benlipkin.github.io 28-04-2012 00:50:33
177 Tweet
628 Followers
1,1K Following

Current KL estimation practices in RLHF can generate high variance and even negative values! We propose a provably better estimator that only takes a few lines of code to implement.🧵👇 w/ Tim Vieira and Ryan Cotterell code: arxiv.org/pdf/2504.10637 paper: github.com/rycolab/kl-rb





New paper: World models + Program synthesis by Wasu Top Piriyakulkij 1. World modeling on-the-fly by synthesizing programs w/ 4000+ lines of code 2. Learns new environments from minutes of experience 3. Positive score on Montezuma's Revenge 4. Compositional generalization to new environments
