Jiaxin Shi (@thjashin) 's Twitter Profile
Jiaxin Shi

@thjashin

Research Scientist @GoogleDeepMind | prev @Stanford @MSRNE @VectorInst @RIKEN_AIP_EN @Tsinghua_Uni. Building probabilistic & algorithmic models for learning.

ID: 702089336842375169

linkhttp://jiaxins.io calendar_today23-02-2016 11:15:16

575 Tweet

3,3K Followers

345 Following

Songlin Yang (@songlinyang4) 's Twitter Profile Photo

🚀 Announcing ASAP: asap-seminar.github.io! A fully virtual seminar bridging theory, algorithms, and systems to tackle fundamental challenges in Transformers. Co-organized by Simran Arora Xinyu Yang Han Guo Our first speaker: Alex Wang on Test-time Regression

🚀 Announcing ASAP: asap-seminar.github.io!

A fully virtual seminar bridging theory, algorithms, and systems to tackle fundamental challenges in Transformers.

Co-organized by <a href="/simran_s_arora/">Simran Arora</a> <a href="/Xinyu2ML/">Xinyu Yang</a> <a href="/HanGuo97/">Han Guo</a> 

Our first speaker: <a href="/heyyalexwang/">Alex Wang</a> on Test-time Regression
Jiaxin Shi (@thjashin) 's Twitter Profile Photo

Got a quite a few likes for this one! For those following advances in scalable Gaussian processes - check out these notable concurrent papers that introduce tighter sparse variational bounds, one by Thang Bui & co and the other from the same Michalis Titsias whose 2009 paper laid

Sasha Rush (@srush_nlp) 's Twitter Profile Photo

Linear Attention and Beyond: Interactive Tutorial with Songlin Yang (Songlin Yang MIT/Flash Linear Attention) I didn’t follow some of the recent results, so I zoomed Songlin and she explained it all to me for two hours 😂 youtu.be/d0HJvGSWw8A

TTIC (@ttic_connect) 's Twitter Profile Photo

Friday, February 28th at 2pm CT: Talks at TTIC presents Jiaxin Shi (Jiaxin Shi) of Google DeepMind with a talk titled "Discrete Generative Modeling with Masked Diffusions." Please join us in Room 530, 5th floor.

Friday, February 28th at 2pm CT: Talks at TTIC presents Jiaxin Shi (<a href="/thjashin/">Jiaxin Shi</a>) of <a href="/GoogleDeepMind/">Google DeepMind</a> with a talk titled "Discrete Generative Modeling with Masked Diffusions." Please join us in Room 530, 5th floor.
Chongxuan Li (@lichongxuan) 's Twitter Profile Photo

We have released the code and models for LLaDA at github.com/ML-GSAI/LLaDA Thanks Shen for providing a very detailed tutorial on how to train your own llada and FAQs.

COPSS (@copssnews) 's Twitter Profile Photo

🙌🎉Our 2025 recipient of the COPSS Presidents' Award, is Lester Mackey! This award is given annually to a young member of the statistical community in recognition of outstanding contributions to the profession of statistics.

🙌🎉Our 2025 recipient of the COPSS Presidents' Award, is Lester Mackey! This award is given annually to a young member of the statistical community in recognition of outstanding contributions to the profession of statistics.
Dimitri von Rütte (@dvruette) 's Twitter Profile Photo

🚨 NEW PAPER DROP! Wouldn't it be nice if LLMs could spot and correct their own mistakes? And what if we could do so directly from pre-training, without any SFT or RL? We present a new class of discrete diffusion models, called GIDD, that are able to do just that: 🧵1/12

Jiaming Song (@baaadas) 's Twitter Profile Photo

As one of the people who popularized the field of diffusion models, I am excited to share something that might be the “beginning of the end” of it. IMM has a single stable training stage, a single objective, and a single network — all are what make diffusion so popular today.

Yuanzhi (@yuanzhi_zhu) 's Twitter Profile Photo

Masked Diffusion Models (MDMs) are a hot topic in generative AI 🔥 — powerful but slow due to multiple sampling steps. We École polytechnique and Inria introduce Di[M]O — a novel approach to distill MDMs into a one-step generator without sacrificing quality.

Jiaxin Shi (@thjashin) 's Twitter Profile Photo

Looks like 7B version of arxiv.org/abs/2410.17891. MD4 type of training and sampling with architecture adaptation from pretrained AR LLM — anneal from causal to bidirectional attention

Jiaxin Shi (@thjashin) 's Twitter Profile Photo

We are hiring a student researcher at Google DeepMind to work on fundamental problems in discrete generative modeling! Examples of our recent work: masked diffusion: arxiv.org/abs/2406.04329 learning-order AR: arxiv.org/abs/2503.05979 If you find this interesting, please send an

Sander Dieleman (@sedielem) 's Twitter Profile Photo

Amazing interview with Yang Song, one of the key researchers we have to thank for diffusion models. The most important lesson IMO: be fearless! The community's view on score matching was quite pessimistic at the time -- he went against the grain and got it to work at scale!

antoine dedieu (@antoine_dedieu) 's Twitter Profile Photo

My colleagues Joe Ortiz, Swaroop Guntupalli and Kevin Patrick Murphy will be in Singapore 🇸🇬 to present our recent work at the exciting World Models Workshop! Do not hesitate to swing by and ask them questions

Mengyue Yang ✈️ ICLR 2025 (@mengyue_yang_) 's Twitter Profile Photo

Such a wonderful #ICLR25 World Models workshop! Amazing speakers, panelists, guests, passionate audience, and dedicated organizers! By the way, why is there a 4-picture limit on posts? #WorldModels

Such a wonderful #ICLR25 World Models workshop! Amazing speakers, panelists, guests, passionate audience, and dedicated organizers!

By the way, why is there a 4-picture limit on posts?

#WorldModels
David Pfau (@pfau) 's Twitter Profile Photo

New paper accepted to ICML! We present a novel policy optimization algorithm for continuous control with a simple closed form which generalizes DDPG, SAC etc. to generic stochastic policies: Wasserstein Policy Optimization (WPO).

New paper accepted to ICML! We present a novel policy optimization algorithm for continuous control with a simple closed form which generalizes DDPG, SAC etc. to generic stochastic policies: Wasserstein Policy Optimization (WPO).
Sander Dieleman (@sedielem) 's Twitter Profile Photo

Here's the third and final part of Slater Stich's "History of diffusion" interview series! The other two interviewees' research played a pivotal role in the rise of diffusion models, whereas I just like to yap about them 😬 this was a wonderful opportunity to do exactly that!