Phong Nguyen-Ha (@phongstormvn) 's Twitter Profile
Phong Nguyen-Ha

@phongstormvn

Senior Research Scientist at Qualcomm, ex-intern @ Meta | Nvidia

ID: 704466112725913600

linkhttp://phongnhhn.info calendar_today01-03-2016 00:39:44

1,1K Tweet

691 Followers

1,1K Following

Phillip Isola (@phillip_isola) 's Twitter Profile Photo

Over the past year, my lab has been working on fleshing out theory/applications of the Platonic Representation Hypothesis. Today I want to share two new works on this topic: Eliciting higher alignment: arxiv.org/abs/2510.02425 Unpaired rep learning: arxiv.org/abs/2510.08492 1/9

Chris Offner (@chrisoffner3d) 's Twitter Profile Photo

Is the terror reign of redundant scene representations ending? Where VGGT, CUT3R, and other recent models relied on godless redundant outputs (depth+points+pose) without guaranteeing internal prediction consistency, MapAnything and DepthAnything 3 are now heroically pushing back.

Is the terror reign of redundant scene representations ending? Where VGGT, CUT3R, and other recent models relied on godless redundant outputs (depth+points+pose) without guaranteeing internal prediction consistency, MapAnything and DepthAnything 3 are now heroically pushing back.
Yanpei Cao (@yanpei_cao) 's Twitter Profile Photo

3D generation often treats objects as single, monolithic shapes. Our new research, OmniPart, takes a different path: generating objects as assemblies of parts for structural coherence and fine-grained control. And it has been accepted to SIGGRAPH Asia 2025 🎉

Markus Schütz (@m_schuetz) 's Twitter Profile Photo

A really fun project we've been working on: Real-Time Rendering with JPEG textures! Ended up working way better than we expected (1500+ fps). Will be interesting extending this to modern formats like AVIF or JPEG XL. Paper: arxiv.org/abs/2510.08166 Code: github.com/elias1518693/j…

A really fun project we've been working on: Real-Time Rendering with JPEG textures! Ended up working way better than we expected (1500+ fps). Will be interesting extending this to modern formats like AVIF or JPEG XL. 

Paper: arxiv.org/abs/2510.08166
Code: github.com/elias1518693/j…
Saining Xie (@sainingxie) 's Twitter Profile Photo

three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. 👇(1/n)

three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right.

today, we introduce Representation Autoencoders (RAE).

>> Retire VAEs. Use RAEs. 👇(1/n)
DailyPapers (@huggingpapers) 's Twitter Profile Photo

Pixel-space generative models hit new SOTA with EPG AMAP, Alibaba, NVIDIA & Caltech introduce EPG, a novel two-stage training framework that achieves state-of-the-art pixel-space diffusion (FID 2.04 on ImageNet-256 with 75 NFE) and consistency models (FID 8.82 in 1 step).

Pixel-space generative models hit new SOTA with EPG

AMAP, Alibaba, NVIDIA & Caltech introduce EPG, a novel two-stage training framework that achieves state-of-the-art pixel-space diffusion (FID 2.04 on ImageNet-256 with 75 NFE) and consistency models (FID 8.82 in 1 step).
MrNeRF (@janusch_patas) 's Twitter Profile Photo

MaterialRefGS: Reflective Gaussian Splatting with Multi-view Consistent Material Inference Contributions: • We propose a novel approach to modeling reflections through Gaussian Splatting with multi-view consistent material inference. This includes a multi-view material

Francesco Capuano (@_fracapuano) 's Twitter Profile Photo

A comprehensive, hands-on tutorial on the most recent advancements in robotics 🤟 ...with self-contained explanations of modern techniques for end-to-end robot learning & ready-to-use code examples using LeRobot and Hugging Face. Now available everywhere! 🤗

A comprehensive, hands-on tutorial on the most recent advancements in robotics 🤟

...with self-contained explanations of modern techniques for end-to-end robot learning &amp; ready-to-use code examples using <a href="/LeRobotHF/">LeRobot</a> and <a href="/huggingface/">Hugging Face</a>. Now available everywhere! 🤗
@wimmerthomas.bsky.social (@wimmer_th) 's Twitter Profile Photo

Super excited to introduce ✨ AnyUp: Universal Feature Upsampling 🔎 Upsample any feature - really any feature - with the same upsampler, no need for cumbersome retraining. SOTA feature upsampling results while being feature-agnostic at inference time.

Google DeepMind (@googledeepmind) 's Twitter Profile Photo

Veo is getting a major upgrade. 🚀 We’re rolling out Veo 3.1, our updated video generation model, alongside improved creative controls for filmmakers, storytellers, and developers - many of them with audio. 🧵

Quankai Gao (@uuuuusher) 's Twitter Profile Photo

🚀 Introducing InstantSfM: Fully Sparse and Parallel Structure-from-Motion. ✅ Python + GPU-optimized implementation, no C++ anymore! ✅ 40× faster than COLMAP with 5K images on single GPU! ✅ Scales beyond 100 images (more than VGGT/VGGSfM can consume)! ✅ Support metric scale.

🚀 Introducing InstantSfM: Fully Sparse and Parallel Structure-from-Motion.
✅ Python + GPU-optimized implementation, no C++ anymore!
✅ 40× faster than COLMAP with 5K images on single GPU!
✅ Scales beyond 100 images (more than VGGT/VGGSfM can consume)!
✅ Support metric scale.
Prune Truong (@prunetruong) 's Twitter Profile Photo

🎺Meet VIST3A — Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator. ➡️ Paper: arxiv.org/abs/2510.13454 ➡️ Website: gohyojun15.github.io/VIST3A/ Collaboration between ETH & Google with Hyojun Go, Dominik Narnhofer, Goutam Bhat, Federico Tombari, and Konrad Schindler.

Hansheng Chen (@hanshengch) 's Twitter Profile Photo

Excited to announce a new track of accelerating Generative AI: pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation github.com/Lakonik/piFlow Distill 20B flow models now using just an L2 loss via imitation learning for SOTA diversity and teacher-aligned quality.

Excited to announce a new track of accelerating Generative AI:

pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation 
github.com/Lakonik/piFlow

Distill 20B flow models now using just an L2 loss via imitation learning for SOTA diversity and teacher-aligned quality.
Haithem Turki (@haithem_turki) 's Twitter Profile Photo

[1/N] Excited to introduce "SimULi: Real-Time LiDAR and Camera Simulation with Unscented Transforms." We extend 3DGUT with LiDAR support and render a wide range of sensors 10-20x faster than ray tracing and 1.5-10x faster than prior rasterization work. research.nvidia.com/labs/sil/proje…

Hadi AlZayer (@hadizayer) 's Twitter Profile Photo

what if you could combine diffusion models instantly? You would get exponentially better control (for free!!👀) This is exactly what we do. In ✨ coupled diffusion sampling ✨, diffusion models guide each other. The result? Diverse editing capabilities!

MrNeRF (@janusch_patas) 's Twitter Profile Photo

2DGS-R: Revisiting the Normal Consistency Regularization in 2D Gaussian Splatting Contributions: • We conduct extensive experiments to assess the impact of incorporating normal consistency (NC) on the 2DGS attributes. Based on our findings, we propose a hierarchical training

2DGS-R: Revisiting the Normal Consistency Regularization in 2D Gaussian Splatting

Contributions:
• We conduct extensive experiments to assess the impact of incorporating normal consistency (NC) on the 2DGS attributes. Based on our findings, we propose a hierarchical training
MrNeRF (@janusch_patas) 's Twitter Profile Photo

Advances in 4D Representation: Geometry, Motion, and Interaction Abstract (excerpt) Instead of offering an exhaustive enumeration of many works, we take a more selective approach by focusing on representative works to highlight both the desirable properties and ensuing

Advances in 4D Representation: Geometry, Motion, and Interaction

Abstract (excerpt)
Instead of offering an exhaustive enumeration of many works, we take a more selective approach by focusing on representative works to highlight both the desirable properties and ensuing
Kwang Moo Yi (@kwangmoo_yi) 's Twitter Profile Photo

Mao et al., "PoseCrafter: Extreme Pose Estimation with Hybrid Video Synthesis" While not perfect, video models do an okay job of creating novel views. Use them to "bridge" between extreme views for pose estimation.

Mao et al., "PoseCrafter: Extreme Pose Estimation with
Hybrid Video Synthesis"

While not perfect, video models do an okay job of creating novel views. Use them to "bridge" between extreme views for pose estimation.
Yanjiang Guo (@gyanjiang) 's Twitter Profile Photo

Rollouts in the real world are slow and expensive. What if we could rollout trajectories entirely inside a world model (WM)? Introducing 🚀Ctrl-World🚀, a generative manipulation WM that can interact with advanced VLA policy in imagination. 🧵1/6