Justin Deschenaux (@jdeschena) 's Twitter Profile
Justin Deschenaux

@jdeschena

PhD student @EPFL working with Caglar Gulcehre. Casting the forces of gradient descent 🧙‍♂️ Working on diffusion language models ⚡️

ID: 1462676204

linkhttps://jdeschena.github.io calendar_today27-05-2013 17:16:58

322 Tweet

371 Followers

490 Following

Subham Sahoo (@ssahoo_) 's Twitter Profile Photo

🚨 [New paper alert] Esoteric Language Models (Eso-LMs) First Diffusion LM to support KV caching w/o compromising parallel generation. 🔥 Sets new SOTA on the sampling speed–quality Pareto frontier 🔥 🚀 65× faster than MDLM ⚡ 4× faster than Block Diffusion 📜 Paper:

🚨 [New paper alert] Esoteric Language Models (Eso-LMs)

First Diffusion LM to support KV caching w/o compromising parallel generation.

🔥 Sets new SOTA on the sampling speed–quality Pareto frontier 🔥
🚀 65× faster than MDLM
⚡ 4× faster than Block Diffusion

📜 Paper:
Chris Wendler (@wendlerch) 's Twitter Profile Photo

How do diffusion models create images and can we control that process? We are excited to release a update to our SDXL Turbo sparse autoencoder paper. New title: One Step is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models Spoiler: We have FLUX SAEs now :)

Mikhail Terekhov (@miterekhov) 's Twitter Profile Photo

AI Control is a promising approach for mitigating misalignment risks, but will it be widely adopted? The answer depends on cost. Our new paper introduces the Control Tax—how much does it cost to run the control protocols? (1/8) 🧵

AI Control is a promising approach for mitigating misalignment risks, but will it be widely adopted? The answer depends on cost. Our new paper introduces the Control Tax—how much does it cost to run the control protocols? (1/8) 🧵
Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

The Diffusion Duality "The arg max operation transforms Gaussian diffusion into Uniform-state diffusion" Adapts consistency distillation to diffusion language models, unlocking few-step generation by accelerating sampling by two orders of magnitude. Introduces a curriculum

The Diffusion Duality 

"The arg max operation transforms Gaussian diffusion into Uniform-state diffusion"

Adapts consistency distillation to diffusion language models, unlocking few-step generation by accelerating sampling by two orders of magnitude.

Introduces a curriculum
Subham Sahoo (@ssahoo_) 's Twitter Profile Photo

🚨 “The Diffusion Duality” is out! International Conference on Minority Languages ⚡️ Few-step generation in discrete diffusion language models by exploiting the underlying Gaussian diffusion. 🦾Beats AR on 3/7 zero-shot likelihood benchmarks. 📄 Paper: arxiv.org/abs/2506.10892 💻 Code: github.com/s-sahoo/duo 🧠

Aaron Gokaslan (@skyli0n) 's Twitter Profile Photo

Check out our recent paper on the "duality" between discrete and Gaussian diffusion. We show how you can exploit that relationship to massively speed up discrete diffusion by two orders of magnitude.

AK (@_akhaliq) 's Twitter Profile Photo

The Diffusion Duality unlock few-step generation in discrete diffusion language models via the underlying Gaussian diffusion

Sander Dieleman (@sedielem) 's Twitter Profile Photo

This work uncovers a profound connection between continuous and discrete (non-absorbing) diffusion models, allowing transfer of advanced techniques such as consistency distillation to the discrete setting! Also: amazing title, no notes! 🧑‍🍳😙🤌

Jonathan Whitaker (@johnowhitaker) 's Twitter Profile Photo

I did another video, on the paper 'The Diffusion Duality', continuing the series of me trying to understand diffusion applied to language models :) Link: youtube.com/watch?v=o_ISAl… I shied away from some of the scarier math - hope my hand-waving is still vaguely useful + correct!

Xiuying Wei@Neurips (Wed11am East #2010) (@xiuyingwei966) 's Twitter Profile Photo

Curious about making Transformers faster on long sequences without compromising accuracy? ⚡️🧠 Meet RAT—an intermediate design between RNN and softmax attention. The results? Faster and lighter like RNNs, with strong performance like Attention! 🐭✨

Curious about making Transformers faster on long sequences without compromising accuracy? ⚡️🧠 Meet RAT—an intermediate design between RNN and softmax attention. The results? Faster and lighter like RNNs, with strong performance like Attention! 🐭✨