Junyi Zhang (@junyi42) 's Twitter Profile
Junyi Zhang

@junyi42

CS Ph.D. Student @Berkeley_AI.
B.Eng. @SJTU1896 CS.
Working with @GoogleDeepMind, previous @MSFTResearch.
Vision, generative model, representation learning.

ID: 1547104932918067200

linkhttp://junyi42.com calendar_today13-07-2022 06:25:47

122 Tweet

1,1K Followers

475 Following

Haocheng Xi (@haochengxiucb) 's Twitter Profile Photo

🚀 Introducing #SparseVideoGen: 2x speedup in video generation with HunyuanVideo with high pixel-level fidelity (PSNR = 29)! No training is required, no perceptible difference to the human eye! Blog: svg-project.github.io Paper: arxiv.org/abs/2502.01776 Code:

Hanwen Jiang (@hanwenjiang1) 's Twitter Profile Photo

Supervised learning has held 3D Vision back for too long. Meet RayZer — a self-supervised 3D model trained with zero 3D labels: ❌ No supervision of camera & geometry ✅ Just RGB images And the wild part? RayZer outperforms supervised methods (as 3D labels from COLMAP is noisy)

Junyi Zhang (@junyi42) 's Twitter Profile Photo

Humanoids need to perceive the environment in the real world Using 4D reconstruction techniques, we turn casual human videos into training data for an environment-aware humanoid policy Super excited to share: VideoMimic.net

Chung Min Kim (@chungminkim) 's Twitter Profile Photo

Excited to introduce PyRoki ("Python Robot Kinematics"): easier IK, trajectory optimization, motion retargeting... with an open-source toolkit on both CPU and GPU

Max Fu (@letian_fu) 's Twitter Profile Photo

Tired of teleoperating your robots? We built a way to scale robot datasets without teleop, dynamic simulation, or even robot hardware. Just one smartphone scan + one human hand demo video → thousands of diverse robot trajectories. Trainable by diffusion policy and VLA models

Junyi Zhang (@junyi42) 's Twitter Profile Photo

Very impressive! At VideoMimic.net, we already: learn from 3rd-person human videos + RL -- for locomotion. Excited to see where this path goes next!

Nate Gillman @ICLR'25 (@gillmanlab) 's Twitter Profile Photo

Ever wish you could turn your video generator into a controllable physics simulator? We're thrilled to introduce Force Prompting! Animate any image with physical forces and get fine-grained control, without needing any physics simulator or 3D assets at inference. 🧵(1/n)

Tianyuan Zhang (@tianyuanzhang99) 's Twitter Profile Photo

Bored of linear recurrent memories (e.g., linear attention) and want a scalable, nonlinear alternative? Our new paper “Test-Time Training Done Right” propose LaCT (Large Chunk Test-Time Training) — a highly efficient, massively scalable nonlinear memory with: 💡 Pure PyTorch

Grace Luo (@graceluo_) 's Twitter Profile Photo

✨New preprint: Dual-Process Image Generation! We distill *feedback from a VLM* into *feed-forward image generation*, at inference time. The result is flexible control: parameterize tasks as multimodal inputs, visually inspect the images with the VLM, and update the generator.🧵

Seohong Park (@seohong_park) 's Twitter Profile Photo

Q-learning is not yet scalable seohong.me/blog/q-learnin… I wrote a blog post about my thoughts on scalable RL algorithms. To be clear, I'm still highly optimistic about off-policy RL and Q-learning! I just think we haven't found the right solution yet (the post discusses why).

Q-learning is not yet scalable

seohong.me/blog/q-learnin…

I wrote a blog post about my thoughts on scalable RL algorithms.

To be clear, I'm still highly optimistic about off-policy RL and Q-learning! I just think we haven't found the right solution yet (the post discusses why).
Kyle Sargent (@kylesargentai) 's Twitter Profile Photo

FlowMo, our paper on diffusion autoencoders for image tokenization, has been accepted to #ICCV2025! See you in Hawaii! 🏄‍♂️

Yutong Bai (@yutongbai1002) 's Twitter Profile Photo

What would a World Model look like if we start from a real embodied agent acting in the real world? It has to have: 1) A real, physically grounded and complex action space—not just abstract control signals. 2) Diverse, real-life scenarios and activities. Or in short: It has to

Ruilong Li (@ruilong_li) 's Twitter Profile Photo

For everyone interested in precise 📷camera control 📷 in transformers [e.g., video / world model etc] Stop settling for Plücker raymaps -- use camera-aware relative PE in your attention layers, like RoPE (for LLMs) but for cameras! Paper & code: liruilong.cn/prope/

For everyone interested in precise 📷camera control 📷 in transformers [e.g., video / world model etc]

Stop settling for Plücker raymaps -- use camera-aware relative PE in your attention layers, like RoPE (for LLMs) but for cameras! 

Paper & code: liruilong.cn/prope/
David McAllister (@davidrmcall) 's Twitter Profile Photo

Excited to share Flow Matching Policy Gradients: expressive RL policies trained from rewards using flow matching. It’s an easy, drop-in replacement for Gaussian PPO on control tasks.

Hongsuk Benjamin Choi (@redstone_hong) 's Twitter Profile Photo

🤖 Initial code release is up for VideoMimic Real2Sim! github.com/hongsukchoi/Vi… VideoMimic is a real-to-sim-to-real pipeline for deploying humanoids in the real world. It supports: - Human motion capture from video - Environment reconstruction for simulation from video -