Jiarui Xu (@jerry_xu_jiarui) 's Twitter Profile
Jiarui Xu

@jerry_xu_jiarui

Final-year Ph.D in UC San Diego
Undergrad. from HKUST

ID: 745524777662582785

linkhttp://jerryxu.net calendar_today22-06-2016 07:52:13

89 Tweet

1,1K Followers

528 Following

Alex Nichol (@unixpickle) 's Twitter Profile Photo

I investigated this in 2017 and even at the time it looked encouraging. Glad to see it make a comeback. github.com/unixpickle/sgd…

Xinlei Chen (@endernewton) 's Twitter Profile Photo

Very happy to see the TTT-series reaching yet another milestone! This time it serves as an inspiration for next-generation architecture post-Transformer, and by connecting TTT to Transformer, it can explain why (autoregressive) Transformers are so good at in-context learning!

Yann Dubois (@yanndubs) 's Twitter Profile Photo

🔥new language modelling layer in town - more expressive than RNN - more efficient (linear comp.) than attention! key perspective: LM layers are ML models trained to memorize tokens in a sequence: - Linear memorizer => RNN - Kernel mem. => attention - Neural mem. => our layer

Jiarui Xu (@jerry_xu_jiarui) 's Twitter Profile Photo

Thinking about a PhD? Don’t miss the chance to work with Elliott / Shangzhe Wu! He’s not only a brilliant researcher but also an inspiring mentor and collaborator. Excited to see the amazing projects his new team will bring to life! 🌟

Omer Bar Tal (@omerbartal) 's Twitter Profile Photo

Meet Pika 2.0! Besides improved quality and motion, our new model can embed user-provided concepts into the generated videos, without any training! Combined with an unprecedented level of text-alignment, you can now create YOUR OWN personalized content with minimal effort 🎬

SifeiL (@sifei30488l) 's Twitter Profile Photo

Introducing GSPN: A Leap Forward in Vision Attention Mechanisms Paper: arxiv.org/pdf/2501.12381 Project: whj363636.github.io/GSPN/ We present GSPN (Generalized Spatial Propagation Network), a novel attention mechanism developed at NVIDIA. Unlike pixel-to-pixel scans like mamba,

Introducing GSPN: A Leap Forward in Vision Attention Mechanisms 
Paper: arxiv.org/pdf/2501.12381
Project: whj363636.github.io/GSPN/

We present GSPN (Generalized Spatial Propagation Network), a novel attention mechanism developed at <a href="/nvidia/">NVIDIA</a>. Unlike pixel-to-pixel scans like mamba,
Yinbo Chen (@yinbochen) 's Twitter Profile Photo

Introducing “Diffusion Autoencoders are Scalable Image Tokenizers” (DiTo). We show that with proper designs and scaling up, diffusion autoencoders (a single L2 loss) can outperform the GAN-LPIPS tokenizers (hybrid losses) used in current SOTA generative models. (1/4)

Introducing “Diffusion Autoencoders are Scalable Image Tokenizers” (DiTo).

We show that with proper designs and scaling up, diffusion autoencoders (a single L2 loss) can outperform the GAN-LPIPS tokenizers (hybrid losses) used in current SOTA generative models. (1/4)
Yuzhe Qin (@qinyuzhe) 's Twitter Profile Photo

Meet our first general-purpose robot at Dexmate dexmate.ai/vega Adjustable height from 0.66m to 2.2m: compact enough for an SUV, tall enough to reach those impossible high shelves. Powerful dual arms (15lbs payload each) and omni-directional mobility for ultimate

Karan Dalal (@karansdalal) 's Twitter Profile Photo

Today, we're releasing a new paper – One-Minute Video Generation with Test-Time Training. We add TTT layers to a pre-trained Transformer and fine-tune it to generate one-minute Tom and Jerry cartoons with strong temporal consistency. Every video below is produced directly by

Gashon Hussein (@gashonhussein) 's Twitter Profile Photo

Excited to share our new paper, "One-Minute Video Generation with Test-Time Training (TTT)" in collaboration with NVIDIA. We augment a pre-trained Transformer with TTT-layers and finetune it to generate one-minute Tom and Jerry cartoons with strong temporal and spatial

Excited to share our new paper, "One-Minute Video Generation with Test-Time Training (TTT)" in collaboration with NVIDIA.

We augment a pre-trained Transformer with TTT-layers and finetune it to generate one-minute Tom and Jerry cartoons with strong temporal and spatial
Xiaolong Wang (@xiaolonw) 's Twitter Profile Photo

Test-Time Training (TTT) is now on Video! And not just a 5-second video. We can generate a full 1-min video! TTT module is an RNN module that provides an explicit and efficient memory mechanism. It models the hidden state of an RNN with a machine learning model, which is updated