Edward Hu (@edward_s_hu) 's Twitter Profile
Edward Hu

@edward_s_hu

cs phd @penn, student researcher @MSFTResearch. investigating ai / rl / intelligence.

ID: 4583386580

linkhttp://www.edwardshu.com calendar_today17-12-2015 09:58:06

145 Tweet

745 Followers

304 Following

Edward Hu (@edward_s_hu) 's Twitter Profile Photo

just coded an RL env for Suika. try it out and let me know if your agent can get the watermelon! github.com/edwhu/suika_rl

Nina Singh, MD (@nina_singh_) 's Twitter Profile Photo

So excited and grateful to share that I matched UC San Francisco for internal medicine residency today! Thank you to my mentors, family, friends, and fiancé Edward Hu for all of your support during my med school journey! Couldn’t have done it without you ❤️

So excited and grateful to share that I matched <a href="/UCSF/">UC San Francisco</a> for internal medicine residency today! Thank you to my mentors, family, friends, and fiancé <a href="/edward_s_hu/">Edward Hu</a> for all of your support during my med school journey!  Couldn’t have done it without you ❤️
Edward Hu (@edward_s_hu) 's Twitter Profile Photo

Pi0 really did work for us on the first try. No camera calibration, controller tuning, etc. The failure cases: missed grasps and risk-averse "hedging" behavior. Excited to see how the robotics community improves on this. At the very least, it will be a good baseline.

Edward Hu (@edward_s_hu) 's Twitter Profile Photo

I'll make a tweet before ICLR'25, but this thread captures the essence well. Predicting the next token with transformers is great; but predicting 2 tokens = provably sufficient representation for planning. Let's do that and see what happens.

Edward Hu (@edward_s_hu) 's Twitter Profile Photo

Sometimes, it's expensive to reset in RL (e.g. robotics). It turns out world models do pretty well here. Why? • learn reset policy in imagination for free • policy training in world model is resistant to distribution shift. Check out Zhao's TMLR accepted project!

Edward Hu (@edward_s_hu) 's Twitter Profile Photo

Well deserved. Dv3 made my last RL paper super easy to tune for new tasks: just vary model size and UTD ratio. Downside is the complexity of code. In PPO, I need to tune clipping, entropy, lr, batch, per new task. But the code is simple. Use the right tool for the job!

Jason Ma (@jasonma2020) 's Twitter Profile Photo

Introducing Dynamism v1 (DYNA-1) by Dyna Robotics – the first robot foundation model built for round-the-clock, high-throughput dexterous autonomy. Here is a time-lapse video of our model autonomously folding 850+ napkins in a span of 24 hours with • 99.4% success rate — zero

Overleaf (@overleaf) 's Twitter Profile Photo

⚠️ Attention: The site is currently down. Our engineering team is investigating. We will update as soon as possible. You can track progress here: status.overleaf.com Sorry for any inconvenience.

Junyao Shi (@junyaoshi) 's Twitter Profile Photo

On my way to Atlanta to present ZeroMimic: Distilling Robotic Manipulation Skills from Web Videos at IEEE ICRA! Stay tuned for an in-depth post about how ZeroMimic distills zero-shot policies from web human videos. 🌐 Project site: zeromimic.github.io