Jiaxin Shi (@thjashin) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

🚀 Announcing ASAP: asap-seminar.github.io! A fully virtual seminar bridging theory, algorithms, and systems to tackle fundamental challenges in Transformers. Co-organized by Simran Arora Xinyu Yang Han Guo Our first speaker: Alex Wang on Test-time Regression

thumb_up_off_alt197

chat_bubble_outline3

repeat53

shareShare

Jiaxin Shi

@thjashin

5 months ago

Got a quite a few likes for this one! For those following advances in scalable Gaussian processes - check out these notable concurrent papers that introduce tighter sparse variational bounds, one by Thang Bui & co and the other from the same Michalis Titsias whose 2009 paper laid

thumb_up_off_alt24

chat_bubble_outline0

repeat0

shareShare

Sasha Rush

@srush_nlp

5 months ago

Linear Attention and Beyond: Interactive Tutorial with Songlin Yang (Songlin Yang MIT/Flash Linear Attention) I didn’t follow some of the recent results, so I zoomed Songlin and she explained it all to me for two hours 😂 youtu.be/d0HJvGSWw8A

thumb_up_off_alt565

chat_bubble_outline5

repeat89

shareShare

Jiaxin Shi

@thjashin

5 months ago

Classics never fade ;) Ridge regression, duality and kernels are still relevant in the age of architecture design

thumb_up_off_alt15

chat_bubble_outline0

repeat0

shareShare

TTIC

@ttic_connect

5 months ago

Friday, February 28th at 2pm CT: Talks at TTIC presents Jiaxin Shi (Jiaxin Shi) of Google DeepMind with a talk titled "Discrete Generative Modeling with Masked Diffusions." Please join us in Room 530, 5th floor.

Friday, February 28th at 2pm CT: Talks at TTIC presents Jiaxin Shi (<a href="/thjashin/">Jiaxin Shi</a>) of <a href="/GoogleDeepMind/">Google DeepMind</a> with a talk titled "Discrete Generative Modeling with Masked Diffusions." Please join us in Room 530, 5th floor.

thumb_up_off_alt8

chat_bubble_outline0

repeat1

shareShare

Chongxuan Li

@lichongxuan

5 months ago

We have released the code and models for LLaDA at github.com/ML-GSAI/LLaDA Thanks Shen for providing a very detailed tutorial on how to train your own llada and FAQs.

thumb_up_off_alt36

chat_bubble_outline1

repeat10

shareShare

Jiaxin Shi

@thjashin

5 months ago

Masked diffusion works great for tabular data.

thumb_up_off_alt24

chat_bubble_outline2

repeat2

shareShare

Jiaxin Shi

@thjashin

5 months ago

Interesting reward-guided sampling algorithms applied to masked diffusion models

thumb_up_off_alt15

chat_bubble_outline0

repeat1

shareShare

Jiaxin Shi

@thjashin

5 months ago

Cute visualization of the unmasking process!

thumb_up_off_alt10

chat_bubble_outline1

repeat0

shareShare

COPSS

@copssnews

5 months ago

🙌🎉Our 2025 recipient of the COPSS Presidents' Award, is Lester Mackey! This award is given annually to a young member of the statistical community in recognition of outstanding contributions to the profession of statistics.

thumb_up_off_alt114

chat_bubble_outline9

repeat22

shareShare

Dimitri von Rütte

@dvruette

5 months ago

🚨 NEW PAPER DROP! Wouldn't it be nice if LLMs could spot and correct their own mistakes? And what if we could do so directly from pre-training, without any SFT or RL? We present a new class of discrete diffusion models, called GIDD, that are able to do just that: 🧵1/12

thumb_up_off_alt1,1K

chat_bubble_outline20

repeat157

shareShare

Jiaming Song

@baaadas

5 months ago

As one of the people who popularized the field of diffusion models, I am excited to share something that might be the “beginning of the end” of it. IMM has a single stable training stage, a single objective, and a single network — all are what make diffusion so popular today.

thumb_up_off_alt881

chat_bubble_outline21

repeat105

shareShare

Yuanzhi

@yuanzhi_zhu

5 months ago

Masked Diffusion Models (MDMs) are a hot topic in generative AI 🔥 — powerful but slow due to multiple sampling steps. We École polytechnique and Inria introduce Di[M]O — a novel approach to distill MDMs into a one-step generator without sacrificing quality.

thumb_up_off_alt188

chat_bubble_outline3

repeat30

shareShare

Jiaxin Shi

@thjashin

4 months ago

Looks like 7B version of arxiv.org/abs/2410.17891. MD4 type of training and sampling with architecture adaptation from pretrained AR LLM — anneal from causal to bidirectional attention

thumb_up_off_alt12

chat_bubble_outline1

repeat1

shareShare

Jiaxin Shi

@thjashin

4 months ago

We are hiring a student researcher at Google DeepMind to work on fundamental problems in discrete generative modeling! Examples of our recent work: masked diffusion: arxiv.org/abs/2406.04329 learning-order AR: arxiv.org/abs/2503.05979 If you find this interesting, please send an

thumb_up_off_alt505

chat_bubble_outline5

repeat63

shareShare

Sander Dieleman

@sedielem

4 months ago

Amazing interview with Yang Song, one of the key researchers we have to thank for diffusion models. The most important lesson IMO: be fearless! The community's view on score matching was quite pessimistic at the time -- he went against the grain and got it to work at scale!

thumb_up_off_alt211

chat_bubble_outline1

repeat40

shareShare

antoine dedieu

@antoine_dedieu

3 months ago

My colleagues Joe Ortiz, Swaroop Guntupalli and Kevin Patrick Murphy will be in Singapore 🇸🇬 to present our recent work at the exciting World Models Workshop! Do not hesitate to swing by and ask them questions

thumb_up_off_alt12

chat_bubble_outline0

repeat1

shareShare

Mengyue Yang ✈️ ICLR 2025

@mengyue_yang_

3 months ago

Such a wonderful #ICLR25 World Models workshop! Amazing speakers, panelists, guests, passionate audience, and dedicated organizers! By the way, why is there a 4-picture limit on posts? #WorldModels

thumb_up_off_alt100

chat_bubble_outline5

repeat9

shareShare

David Pfau

@pfau

3 months ago

New paper accepted to ICML! We present a novel policy optimization algorithm for continuous control with a simple closed form which generalizes DDPG, SAC etc. to generic stochastic policies: Wasserstein Policy Optimization (WPO).

thumb_up_off_alt454

chat_bubble_outline3

repeat41

shareShare

Sander Dieleman

@sedielem

3 months ago

Here's the third and final part of Slater Stich's "History of diffusion" interview series! The other two interviewees' research played a pivotal role in the rise of diffusion models, whereas I just like to yap about them 😬 this was a wonderful opportunity to do exactly that!

thumb_up_off_alt162

chat_bubble_outline2

repeat33

shareShare