Phong Nguyen-Ha (@phongstormvn) Twitter Tweets • TwiCopy

Phillip Isola

a month ago

Over the past year, my lab has been working on fleshing out theory/applications of the Platonic Representation Hypothesis. Today I want to share two new works on this topic: Eliciting higher alignment: arxiv.org/abs/2510.02425 Unpaired rep learning: arxiv.org/abs/2510.08492 1/9

thumb_up_off_alt629

chat_bubble_outline8

repeat108

shareShare

Chris Offner

@chrisoffner3d

a month ago

Is the terror reign of redundant scene representations ending? Where VGGT, CUT3R, and other recent models relied on godless redundant outputs (depth+points+pose) without guaranteeing internal prediction consistency, MapAnything and DepthAnything 3 are now heroically pushing back.

thumb_up_off_alt70

chat_bubble_outline4

repeat11

shareShare

Yanpei Cao

@yanpei_cao

a month ago

3D generation often treats objects as single, monolithic shapes. Our new research, OmniPart, takes a different path: generating objects as assemblies of parts for structural coherence and fine-grained control. And it has been accepted to SIGGRAPH Asia 2025 🎉

thumb_up_off_alt135

chat_bubble_outline4

repeat15

shareShare

Markus Schütz

@m_schuetz

a month ago

A really fun project we've been working on: Real-Time Rendering with JPEG textures! Ended up working way better than we expected (1500+ fps). Will be interesting extending this to modern formats like AVIF or JPEG XL. Paper: arxiv.org/abs/2510.08166 Code: github.com/elias1518693/j…

thumb_up_off_alt208

chat_bubble_outline3

repeat32

shareShare

Saining Xie

@sainingxie

a month ago

three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. 👇(1/n)

thumb_up_off_alt1,1K

chat_bubble_outline55

repeat321

shareShare

AK

@_akhaliq

a month ago

DiT360 High-Fidelity Panoramic Image Generation via Hybrid Training

thumb_up_off_alt104

chat_bubble_outline1

repeat13

shareShare

DailyPapers

@huggingpapers

25 days ago

Pixel-space generative models hit new SOTA with EPG AMAP, Alibaba, NVIDIA & Caltech introduce EPG, a novel two-stage training framework that achieves state-of-the-art pixel-space diffusion (FID 2.04 on ImageNet-256 with 75 NFE) and consistency models (FID 8.82 in 1 step).

thumb_up_off_alt91

chat_bubble_outline2

repeat12

shareShare

MrNeRF

@janusch_patas

25 days ago

MaterialRefGS: Reflective Gaussian Splatting with Multi-view Consistent Material Inference Contributions: • We propose a novel approach to modeling reflections through Gaussian Splatting with multi-view consistent material inference. This includes a multi-view material

thumb_up_off_alt40

chat_bubble_outline1

repeat5

shareShare

$Francesco Capuano (@_fracapuano) 's Twitter Profile Photo$

Francesco Capuano

@_fracapuano

25 days ago

A comprehensive, hands-on tutorial on the most recent advancements in robotics 🤟 ...with self-contained explanations of modern techniques for end-to-end robot learning & ready-to-use code examples using LeRobot and Hugging Face. Now available everywhere! 🤗

$Francesco Capuano (@_fracapuano) on Twitter photo A comprehensive, hands-on tutorial on the most recent advancements in robotics 🤟 ...with self-contained explanations of modern techniques for end-to-end robot learning & ready-to-use code examples using <a href="/LeRobotHF/">LeRobot</a> and <a href="/huggingface/">Hugging Face</a>. Now available everywhere! 🤗$

thumb_up_off_alt1,1K

chat_bubble_outline25

repeat254

shareShare

@wimmerthomas.bsky.social

@wimmer_th

25 days ago

Super excited to introduce ✨ AnyUp: Universal Feature Upsampling 🔎 Upsample any feature - really any feature - with the same upsampler, no need for cumbersome retraining. SOTA feature upsampling results while being feature-agnostic at inference time.

thumb_up_off_alt778

chat_bubble_outline6

repeat118

shareShare

Google DeepMind

@googledeepmind

25 days ago

Veo is getting a major upgrade. 🚀 We’re rolling out Veo 3.1, our updated video generation model, alongside improved creative controls for filmmakers, storytellers, and developers - many of them with audio. 🧵

thumb_up_off_alt2,2K

chat_bubble_outline125

repeat422

shareShare

Quankai Gao

@uuuuusher

24 days ago

🚀 Introducing InstantSfM: Fully Sparse and Parallel Structure-from-Motion. ✅ Python + GPU-optimized implementation, no C++ anymore! ✅ 40× faster than COLMAP with 5K images on single GPU! ✅ Scales beyond 100 images (more than VGGT/VGGSfM can consume)! ✅ Support metric scale.

thumb_up_off_alt352

chat_bubble_outline5

repeat46

shareShare

Prune Truong

@prunetruong

24 days ago

🎺Meet VIST3A — Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator. ➡️ Paper: arxiv.org/abs/2510.13454 ➡️ Website: gohyojun15.github.io/VIST3A/ Collaboration between ETH & Google with Hyojun Go, Dominik Narnhofer, Goutam Bhat, Federico Tombari, and Konrad Schindler.

thumb_up_off_alt87

chat_bubble_outline2

repeat11

shareShare

Hansheng Chen

@hanshengch

23 days ago

Excited to announce a new track of accelerating Generative AI: pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation github.com/Lakonik/piFlow Distill 20B flow models now using just an L2 loss via imitation learning for SOTA diversity and teacher-aligned quality.

thumb_up_off_alt152

chat_bubble_outline3

repeat28

shareShare

Haithem Turki

@haithem_turki

23 days ago

[1/N] Excited to introduce "SimULi: Real-Time LiDAR and Camera Simulation with Unscented Transforms." We extend 3DGUT with LiDAR support and render a wide range of sensors 10-20x faster than ray tracing and 1.5-10x faster than prior rasterization work. research.nvidia.com/labs/sil/proje…

thumb_up_off_alt157

chat_bubble_outline5

repeat41

shareShare

Hadi AlZayer

@hadizayer

23 days ago

what if you could combine diffusion models instantly? You would get exponentially better control (for free!!👀) This is exactly what we do. In ✨ coupled diffusion sampling ✨, diffusion models guide each other. The result? Diverse editing capabilities!

thumb_up_off_alt163

chat_bubble_outline5

repeat32

shareShare

MrNeRF

@janusch_patas

19 days ago

2DGS-R: Revisiting the Normal Consistency Regularization in 2D Gaussian Splatting Contributions: • We conduct extensive experiments to assess the impact of incorporating normal consistency (NC) on the 2DGS attributes. Based on our findings, we propose a hierarchical training

thumb_up_off_alt58

chat_bubble_outline2

repeat7

shareShare

MrNeRF

@janusch_patas

17 days ago

Advances in 4D Representation: Geometry, Motion, and Interaction Abstract (excerpt) Instead of offering an exhaustive enumeration of many works, we take a more selective approach by focusing on representative works to highlight both the desirable properties and ensuing

thumb_up_off_alt94

chat_bubble_outline1

repeat24

shareShare

Kwang Moo Yi

@kwangmoo_yi

17 days ago

Mao et al., "PoseCrafter: Extreme Pose Estimation with Hybrid Video Synthesis" While not perfect, video models do an okay job of creating novel views. Use them to "bridge" between extreme views for pose estimation.

thumb_up_off_alt31

chat_bubble_outline1

repeat5

shareShare

Yanjiang Guo

@gyanjiang

16 days ago

Rollouts in the real world are slow and expensive. What if we could rollout trajectories entirely inside a world model (WM)? Introducing 🚀Ctrl-World🚀, a generative manipulation WM that can interact with advanced VLA policy in imagination. 🧵1/6

thumb_up_off_alt114

chat_bubble_outline5

repeat17

shareShare