Aaron Gokaslan (@skyli0n) 's Twitter Profile
Aaron Gokaslan

@skyli0n

Maker of the OpenWebText. @Mozilla Rise25 @PyTorch Core Reviewer. PhD Candidate at @Cornell Previously @FacebookAI and @BrownUniversity Graduating May 2025

ID: 3514528095

linkhttps://skylion007.github.io/ calendar_today01-09-2015 17:12:26

2,2K Tweet

3,3K Followers

419 Following

Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

The Diffusion Duality "The arg max operation transforms Gaussian diffusion into Uniform-state diffusion" Adapts consistency distillation to diffusion language models, unlocking few-step generation by accelerating sampling by two orders of magnitude. Introduces a curriculum

The Diffusion Duality 

"The arg max operation transforms Gaussian diffusion into Uniform-state diffusion"

Adapts consistency distillation to diffusion language models, unlocking few-step generation by accelerating sampling by two orders of magnitude.

Introduces a curriculum
tenderizzation (@tenderizzation) 's Twitter Profile Photo

the extra pages when you request a buffer from CUDACachingAllocator that is two pages smaller than the largest free cached block and you didn’t set PYTORCH_CUDA_ALLOC_CONF=“expandable_segments:True”

Aaron Gokaslan (@skyli0n) 's Twitter Profile Photo

Check out our recent paper on the "duality" between discrete and Gaussian diffusion. We show how you can exploit that relationship to massively speed up discrete diffusion by two orders of magnitude.

Phillip Isola (@phillip_isola) 's Twitter Profile Photo

Our computer vision textbook is now available for free online here: visionbook.mit.edu We are working on adding some interactive components like search and (beta) integration with LLMs. Hope this is useful and feel free to submit Github issues to help us improve the text!

The AI Timeline (@theaitimeline) 's Twitter Profile Photo

The Diffusion Duality Author's Explanation: x.com/ssahoo_/status… Overview: Duo narrows the performance gap of uniform-state discrete diffusion models for text generation by leveraging their inherent connection to underlying Gaussian diffusion. This method introduces a

The Diffusion Duality

Author's Explanation:
x.com/ssahoo_/status…

Overview:
Duo narrows the performance gap of uniform-state discrete diffusion models for text generation by leveraging their inherent connection to underlying Gaussian diffusion.

This method introduces a
AK (@_akhaliq) 's Twitter Profile Photo

The Diffusion Duality unlock few-step generation in discrete diffusion language models via the underlying Gaussian diffusion

Charlie Marsh (@charliermarsh) 's Twitter Profile Photo

The Python Steering Council has voted to remove the "experimental" label from the free-threaded ("nogil") builds for Python 3.14. Big step towards making them the default in a future version of CPython!

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

elie I'm not 100% sure about that. As an example I was just browsing through the DCLM-baseline datamix (which is ~SOTA) and it is *terrible*. Compared to what I could in principle imagine. Major concessions are made in data quality to gather enough data quantity.

Matthew Leavitt (@leavittron) 's Twitter Profile Photo

Don't trust your lying eyes! The DCLM paper has a great section on human-model quality assessment alignment that shows that humans are pretty bad at assessing data quality arxiv.org/pdf/2406.11794…. I'm often surprised at what makes it into our good vs. rejected token piles

DailyPapers (@huggingpapers) 's Twitter Profile Photo

Checkout out "The Diffusion Duality" on HF papers! huggingface.co/papers/2506.10… Also see the author's collection: huggingface.co/collections/s-…

Shayne Longpre (@shayneredford) 's Twitter Profile Photo

Thrilled to collaborate on the launch of 📚 CommonPile v0.1 📚 ! Introducing the largest openly-licensed LLM pretraining corpus (8 TB), led by Nikhil Kandpal Brian Lester Colin Raffel. 📜: arxiv.org/pdf/2506.05209 📚🤖 Data & models: huggingface.co/common-pile 1/

Thrilled to collaborate on the launch of 📚 CommonPile v0.1 📚 !

Introducing the largest openly-licensed LLM pretraining corpus (8 TB), led by <a href="/kandpal_nikhil/">Nikhil Kandpal</a> <a href="/blester125/">Brian Lester</a> <a href="/colinraffel/">Colin Raffel</a>.

📜: arxiv.org/pdf/2506.05209
 📚🤖 Data &amp; models: huggingface.co/common-pile
1/
Andy Konwinski (@andykonwinski) 's Twitter Profile Photo

Laude Institute is a non-profit that gives the right resources to the right researchers at the right time. We help more researchers go from idea to impact. laude.org

Stas Bekman (@stasbekman) 's Twitter Profile Photo

As I'm diving into Sequence/Context parallelism in the last few days I wanted to share this write up in 2 parts that nicely compares the few approaches out there and some of their combinations with papers: p1: insujang.github.io/2024-01-11/ten… p2: insujang.github.io/2024-09-20/int…

Guilherme Penedo (@gui_penedo) 's Twitter Profile Photo

We have finally released the 📝paper for 🥂FineWeb2, our large multilingual pre-training dataset. Along with general (and exhaustive) multilingual work, we introduce a concept that can also improve English performance: deduplication-based upsampling, which we call rehydration.

We have finally released the 📝paper for 🥂FineWeb2, our large multilingual pre-training dataset.

Along with general (and exhaustive) multilingual work, we introduce a concept that can also improve English performance: deduplication-based upsampling, which we call rehydration.