Michael Tang @ ICLR (@_michaeltang_) 's Twitter Profile
Michael Tang @ ICLR

@_michaeltang_

post-training / prev @princeton

ID: 1293830551

linkhttp://michaeltang.xyz calendar_today24-03-2013 06:02:16

66 Tweet

348 Followers

776 Following

Michael Tang @ ICLR (@_michaeltang_) 's Twitter Profile Photo

just landed in thailand, presenting referral augmentation ("PageRank x retrieval") at ACL 2024! would love to chat about anything related to retrieval, synthetic data, or code gen :)

Michael Tang @ ICLR (@_michaeltang_) 's Twitter Profile Photo

ben wrote up extended thoughts on gpt4 attempting usaco with human-in-the-loop! trajectories contain lots of insights, incl. a recurring issue where the model identifies a bug while reflecting but its code revision then fails to fix it 👀

Ben Eysenbach (@ben_eysenbach) 's Twitter Profile Photo

I'm excited to share recent work with Grace Liu and Michael Tang on exploration in RL! A Single Goal is All You Need: Skills and Exploration Emerge from Contrastive RL without Rewards, Demonstrations, or Subgoals Paper, code, and videos: graliuce.github.io/sgcrl/ A thread.

I'm excited to share recent work with <a href="/GraceLiu78/">Grace Liu</a> and <a href="/_michaeltang_/">Michael Tang</a> on exploration in RL!

A Single Goal is All You Need: Skills and Exploration Emerge from Contrastive RL without Rewards, Demonstrations, or Subgoals

Paper, code, and videos: graliuce.github.io/sgcrl/

A thread.
Michael Tang @ ICLR (@_michaeltang_) 's Twitter Profile Photo

Next-gen evals should be interactive 👀 • Long-horizon env interaction tests realistic planning (SWEBench, WebShop) • Human interaction draws out hidden capabilities (USACO human-in-loop) • Multi-turn interaction with a bad actor better captures the surface area for attacks

Michael Tang @ ICLR (@_michaeltang_) 's Twitter Profile Photo

If you're at NeurIPS, stop by our workshop oral! Grace Liu will be presenting on single-goal contrastive RL at 2:45-3pm on Sun 12/15, don't miss it :) imol-workshop.github.io/pages/program/