Zachary Horvitz (@zachary_horvitz) 's Twitter Profile
Zachary Horvitz

@zachary_horvitz

CS PhD student at @Columbia. Prev research lead @RadAI, @BrownUniversity

ID: 1425384838635327499

calendar_today11-08-2021 09:13:38

281 Tweet

367 Followers

667 Following

Subham Sahoo (@ssahoo_) 's Twitter Profile Photo

🔥 Rethinking Reasoning (with Diffusion LLMs) This work changes how you think about reasoning in LLMs. 🤯 Turns out: you don’t need the full chain-of-thought — only a small subset of CoT tokens actually matter for the final answer. ❌ Autoregressive LLMs can’t exploit this

🔥 Rethinking Reasoning (with Diffusion LLMs) 

This work changes how you think about reasoning in LLMs.

🤯 Turns out: you don’t need the full chain-of-thought — only a small subset of CoT tokens actually matter for the final answer. 

❌ Autoregressive LLMs can’t exploit this
Subham Sahoo (@ssahoo_) 's Twitter Profile Photo

Overwhelmed by the number of Diffusion LLM papers? 🌊 Same here 😭 So I’m starting a Discrete Diffusion Reading Group (Discrete Diffusion Reading Group) with my favorite disciples Justin Deschenaux and Zhihan Yang ✨ We’ll cover everything—from theory to empirics, from language to molecules. Join

Overwhelmed by the number of Diffusion LLM papers? 🌊
Same here 😭

So I’m starting a Discrete Diffusion Reading Group (<a href="/diffusion_llms/">Discrete Diffusion Reading Group</a>) with my favorite disciples <a href="/jdeschena/">Justin Deschenaux</a>  and <a href="/zhihanyang_/">Zhihan Yang</a>  ✨

We’ll cover everything—from theory to empirics, from language to molecules.

Join
Sander Dieleman (@sedielem) 's Twitter Profile Photo

Generative modelling used to be about capturing the training data distribution. Interestingly, this stopped being the case when we started actually using them🤔 We tweak temps, use classifier-free guidance and post-train to get a distribution better than the training data.

Jinjie Ni @ ICLR'25 🇸🇬 (@nijinjie) 's Twitter Profile Photo

1/3 🚬 Ready to smell your GPUs burning? Introducing MegaDLMs, the first production-level library for training diffusion language models, offering 3× faster training speed and up to 47% MFU. Empowered by Megatron-LM and Transformer-Engine, it offers near-perfect linear

1/3

🚬 Ready to smell your GPUs burning?

Introducing MegaDLMs, the first production-level library for training diffusion language models, offering 3× faster training speed and up to 47% MFU.

Empowered by Megatron-LM and Transformer-Engine, it offers near-perfect linear
Nomic (@nomic_ai) 's Twitter Profile Photo

AI systems excel in domains that have abundant coverage in internet data. Large sectors of the economy are not digital-native. Their data, processes, and workflows are governed by signals that are out of distribution of foundation models. Introducing the new Nomic Platform

AI systems excel in domains that have abundant coverage in internet data.

Large sectors of the economy are not digital-native. Their data, processes, and workflows are governed by signals that are out of distribution of foundation models.
 
Introducing the new Nomic Platform
Jatin Prakash (@bicycleman15) 's Twitter Profile Photo

New paper alert 🚨 What if I told you there is an architecture that provides a _knob_ to control quality-efficiency trade-offs directly at test-time? Introducing Compress & Attend Transformers (CATs) that provide you exactly this! 🧵(1/n) 👇

New paper alert 🚨

What if I told you there is an architecture that provides a _knob_ to control quality-efficiency trade-offs directly at test-time?

Introducing Compress &amp; Attend Transformers (CATs) that provide you exactly this!

🧵(1/n) 👇
Jaeyeon Kim (@jaeyeon_kim_0) 's Twitter Profile Photo

🚨🚨🚨 Now your Masked Diffusion Model can self-correct! We propose PRISM, a plug-and-play approach fine-tuning method that adds self-correction ability to any pretrained MDM! (1/N)