Aaron Gokaslan (@skyli0n) Twitter Tweets • TwiCopy

Aaron Gokaslan

@skyli0n

+ Follow

Maker of the OpenWebText. @Mozilla Rise25 @PyTorch Core Reviewer. PhD Candidate at @Cornell Previously @FacebookAI and @BrownUniversity Graduating May 2025

ID: 3514528095

linkhttps://skylion007.github.io/ calendar_today01-09-2015 17:12:26

2,2K Tweet

3,3K Followers

419 Following

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

The Diffusion Duality "The arg max operation transforms Gaussian diffusion into Uniform-state diffusion" Adapts consistency distillation to diffusion language models, unlocking few-step generation by accelerating sampling by two orders of magnitude. Introduces a curriculum

thumb_up_off_alt471

chat_bubble_outline5

repeat73

shareShare

tenderizzation

@tenderizzation

a month ago

the extra pages when you request a buffer from CUDACachingAllocator that is two pages smaller than the largest free cached block and you didn’t set PYTORCH_CUDA_ALLOC_CONF=“expandable_segments:True”

thumb_up_off_alt64

chat_bubble_outline4

repeat4

shareShare

Aaron Gokaslan

@skyli0n

a month ago

Check out our recent paper on the "duality" between discrete and Gaussian diffusion. We show how you can exploit that relationship to massively speed up discrete diffusion by two orders of magnitude.

thumb_up_off_alt20

chat_bubble_outline0

repeat5

shareShare

Phillip Isola

@phillip_isola

a month ago

Our computer vision textbook is now available for free online here: visionbook.mit.edu We are working on adding some interactive components like search and (beta) integration with LLMs. Hope this is useful and feel free to submit Github issues to help us improve the text!

thumb_up_off_alt2,2K

chat_bubble_outline35

repeat595

shareShare

The AI Timeline

@theaitimeline

a month ago

The Diffusion Duality Author's Explanation: x.com/ssahoo_/status… Overview: Duo narrows the performance gap of uniform-state discrete diffusion models for text generation by leveraging their inherent connection to underlying Gaussian diffusion. This method introduces a

thumb_up_off_alt16

chat_bubble_outline1

repeat3

shareShare

You Jiacheng

@youjiacheng

a month ago

Interesting blog fabianfalck.com/posts/spectral…

thumb_up_off_alt272

chat_bubble_outline4

repeat33

shareShare

AK

@_akhaliq

a month ago

The Diffusion Duality unlock few-step generation in discrete diffusion language models via the underlying Gaussian diffusion

thumb_up_off_alt265

chat_bubble_outline6

repeat47

shareShare

Charlie Marsh

@charliermarsh

a month ago

The Python Steering Council has voted to remove the "experimental" label from the free-threaded ("nogil") builds for Python 3.14. Big step towards making them the default in a future version of CPython!

thumb_up_off_alt645

chat_bubble_outline18

repeat50

shareShare

LaurieWired

@lauriewired

a month ago

Yuchen Jin interestingly, the same was said about compilers in the 1950s. compilers were considered "cheating"

<a href="/Yuchenj_UW/">Yuchen Jin</a> interestingly, the same was said about compilers in the 1950s.

compilers were considered "cheating"

thumb_up_off_alt518

chat_bubble_outline9

repeat15

shareShare

LaurieWired

@lauriewired

a month ago

sheesh if anyone wants some VC funded, unsustainable B200s, I found some for $1.50 an hour

thumb_up_off_alt1,1K

chat_bubble_outline20

repeat26

shareShare

Andrej Karpathy

@karpathy

a month ago

elie I'm not 100% sure about that. As an example I was just browsing through the DCLM-baseline datamix (which is ~SOTA) and it is *terrible*. Compared to what I could in principle imagine. Major concessions are made in data quality to gather enough data quantity.

thumb_up_off_alt192

chat_bubble_outline13

repeat5

shareShare

Matthew Leavitt

@leavittron

a month ago

Don't trust your lying eyes! The DCLM paper has a great section on human-model quality assessment alignment that shows that humans are pretty bad at assessing data quality arxiv.org/pdf/2406.11794…. I'm often surprised at what makes it into our good vs. rejected token piles

thumb_up_off_alt28

chat_bubble_outline1

repeat5

shareShare

DailyPapers

@huggingpapers

a month ago

Checkout out "The Diffusion Duality" on HF papers! huggingface.co/papers/2506.10… Also see the author's collection: huggingface.co/collections/s-…

thumb_up_off_alt5

chat_bubble_outline1

repeat2

shareShare

Shayne Longpre

@shayneredford

a month ago

Thrilled to collaborate on the launch of 📚 CommonPile v0.1 📚 ! Introducing the largest openly-licensed LLM pretraining corpus (8 TB), led by Nikhil Kandpal Brian Lester Colin Raffel. 📜: arxiv.org/pdf/2506.05209 📚🤖 Data & models: huggingface.co/common-pile 1/

Thrilled to collaborate on the launch of 📚 CommonPile v0.1 📚 !

Introducing the largest openly-licensed LLM pretraining corpus (8 TB), led by <a href="/kandpal_nikhil/">Nikhil Kandpal</a> <a href="/blester125/">Brian Lester</a> <a href="/colinraffel/">Colin Raffel</a>.

📜: arxiv.org/pdf/2506.05209
📚🤖 Data & models: huggingface.co/common-pile
1/

thumb_up_off_alt58

chat_bubble_outline2

repeat14

shareShare

Andy Konwinski

@andykonwinski

a month ago

Laude Institute is a non-profit that gives the right resources to the right researchers at the right time. We help more researchers go from idea to impact. laude.org

thumb_up_off_alt101

chat_bubble_outline1

repeat5

shareShare

@jacobparis.com ❖

@jacobmparis

a month ago

TIL localhost can have subdomains

thumb_up_off_alt7,7K

chat_bubble_outline195

repeat387

shareShare

Aaron Gokaslan

@skyli0n

a month ago

The Bitter Lesson is coming for tokenization too: lucalp.dev/bitter-lesson-…

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Stas Bekman

@stasbekman

a month ago

As I'm diving into Sequence/Context parallelism in the last few days I wanted to share this write up in 2 parts that nicely compares the few approaches out there and some of their combinations with papers: p1: insujang.github.io/2024-01-11/ten… p2: insujang.github.io/2024-09-20/int…

thumb_up_off_alt214

chat_bubble_outline5

repeat25

shareShare

Guilherme Penedo

@gui_penedo

a month ago

We have finally released the 📝paper for 🥂FineWeb2, our large multilingual pre-training dataset. Along with general (and exhaustive) multilingual work, we introduce a concept that can also improve English performance: deduplication-based upsampling, which we call rehydration.

thumb_up_off_alt316

chat_bubble_outline7

repeat63

shareShare

Aaron Gokaslan

Gate.io

Tanishq Mathew Abraham, Ph.D.

tenderizzation

Aaron Gokaslan

Phillip Isola

The AI Timeline

You Jiacheng

AK

Charlie Marsh

LaurieWired

LaurieWired

Andrej Karpathy

Matthew Leavitt

DailyPapers

Shayne Longpre

Andy Konwinski

@jacobparis.com ❖

Aaron Gokaslan

Stas Bekman

Guilherme Penedo