Junyi Zhang (@junyi42) Twitter Tweets • TwiCopy

Haocheng Xi

8 months ago

🚀 Introducing #SparseVideoGen: 2x speedup in video generation with HunyuanVideo with high pixel-level fidelity (PSNR = 29)! No training is required, no perceptible difference to the human eye! Blog: svg-project.github.io Paper: arxiv.org/abs/2502.01776 Code:

thumb_up_off_alt258

chat_bubble_outline13

repeat55

shareShare

Hanwen Jiang

@hanwenjiang1

6 months ago

Supervised learning has held 3D Vision back for too long. Meet RayZer — a self-supervised 3D model trained with zero 3D labels: ❌ No supervision of camera & geometry ✅ Just RGB images And the wild part? RayZer outperforms supervised methods (as 3D labels from COLMAP is noisy)

thumb_up_off_alt391

chat_bubble_outline5

repeat69

shareShare

Junyi Zhang

@junyi42

6 months ago

Humanoids need to perceive the environment in the real world Using 4D reconstruction techniques, we turn casual human videos into training data for an environment-aware humanoid policy Super excited to share: VideoMimic.net

thumb_up_off_alt135

chat_bubble_outline2

repeat18

shareShare

Chung Min Kim

@chungminkim

6 months ago

Excited to introduce PyRoki ("Python Robot Kinematics"): easier IK, trajectory optimization, motion retargeting... with an open-source toolkit on both CPU and GPU

thumb_up_off_alt1,1K

chat_bubble_outline22

repeat166

shareShare

Max Fu

@letian_fu

6 months ago

Tired of teleoperating your robots? We built a way to scale robot datasets without teleop, dynamic simulation, or even robot hardware. Just one smartphone scan + one human hand demo video → thousands of diverse robot trajectories. Trainable by diffusion policy and VLA models

thumb_up_off_alt407

chat_bubble_outline21

repeat77

shareShare

Junyi Zhang

@junyi42

6 months ago

Very impressive! At VideoMimic.net, we already: learn from 3rd-person human videos + RL -- for locomotion. Excited to see where this path goes next!

thumb_up_off_alt213

chat_bubble_outline2

repeat22

shareShare

Nate Gillman @ICLR'25

@gillmanlab

6 months ago

Ever wish you could turn your video generator into a controllable physics simulator? We're thrilled to introduce Force Prompting! Animate any image with physical forces and get fine-grained control, without needing any physics simulator or 3D assets at inference. 🧵(1/n)

thumb_up_off_alt295

chat_bubble_outline8

repeat64

shareShare

Tianyuan Zhang

@tianyuanzhang99

5 months ago

Bored of linear recurrent memories (e.g., linear attention) and want a scalable, nonlinear alternative? Our new paper “Test-Time Training Done Right” propose LaCT (Large Chunk Test-Time Training) — a highly efficient, massively scalable nonlinear memory with: 💡 Pure PyTorch

thumb_up_off_alt390

chat_bubble_outline5

repeat74

shareShare

Grace Luo

@graceluo_

5 months ago

✨New preprint: Dual-Process Image Generation! We distill *feedback from a VLM* into *feed-forward image generation*, at inference time. The result is flexible control: parameterize tasks as multimodal inputs, visually inspect the images with the VLM, and update the generator.🧵

thumb_up_off_alt1,1K

chat_bubble_outline18

repeat165

shareShare

Hongsuk Benjamin Choi

@redstone_hong

5 months ago

Cudos to VideoMimic team:)

thumb_up_off_alt14

chat_bubble_outline0

repeat1

shareShare

Seohong Park

@seohong_park

5 months ago

Q-learning is not yet scalable seohong.me/blog/q-learnin… I wrote a blog post about my thoughts on scalable RL algorithms. To be clear, I'm still highly optimistic about off-policy RL and Q-learning! I just think we haven't found the right solution yet (the post discusses why).

thumb_up_off_alt1,1K

chat_bubble_outline34

repeat174

shareShare

Kyle Sargent

@kylesargentai

5 months ago

FlowMo, our paper on diffusion autoencoders for image tokenization, has been accepted to #ICCV2025! See you in Hawaii! 🏄‍♂️

thumb_up_off_alt93

chat_bubble_outline1

repeat14

shareShare

Yutong Bai

@yutongbai1002

5 months ago

What would a World Model look like if we start from a real embodied agent acting in the real world? It has to have: 1) A real, physically grounded and complex action space—not just abstract control signals. 2) Diverse, real-life scenarios and activities. Or in short: It has to

thumb_up_off_alt283

chat_bubble_outline17

repeat74

shareShare

Ruilong Li

@ruilong_li

4 months ago

For everyone interested in precise 📷camera control 📷 in transformers [e.g., video / world model etc] Stop settling for Plücker raymaps -- use camera-aware relative PE in your attention layers, like RoPE (for LLMs) but for cameras! Paper & code: liruilong.cn/prope/

thumb_up_off_alt410

chat_bubble_outline7

repeat75

shareShare

Yifan Wang

@yyfz321021

4 months ago

🔥 π³ is here! 🔥 The new way to do 3D visual geometry. 🧠 Core Idea: A feed-forward net that doesn‘t care about input order. No more reference view! ✅ Permutation-Equivariant 🔄 ✅ Reference-Free 🆓 ✅ Highly Scalable 📈 ✅ SOTA Results 🚀

thumb_up_off_alt356

chat_bubble_outline5

repeat53

shareShare

David McAllister

@davidrmcall

4 months ago

Excited to share Flow Matching Policy Gradients: expressive RL policies trained from rewards using flow matching. It’s an easy, drop-in replacement for Gaussian PPO on control tasks.

thumb_up_off_alt1,1K

chat_bubble_outline8

repeat185

shareShare

Hongsuk Benjamin Choi

@redstone_hong

3 months ago

🤖 Initial code release is up for VideoMimic Real2Sim! github.com/hongsukchoi/Vi… VideoMimic is a real-to-sim-to-real pipeline for deploying humanoids in the real world. It supports: - Human motion capture from video - Environment reconstruction for simulation from video -

thumb_up_off_alt165

chat_bubble_outline3

repeat33

shareShare

Junyi Zhang

@junyi42

3 months ago

Huge shout out to Viser!! It made all my 3D/4D related projects so much more easier. Thank you, Brent Yi!!🥹

thumb_up_off_alt28

chat_bubble_outline0

repeat2

shareShare