Joy Dong (@joychew_d) 's Twitter Profile
Joy Dong

@joychew_d

PhD candidate @UMich. Previously @PyTorch @NVidia. #ConfidentialComputing #GPU Optimization & Architecture

ID: 419348267

calendar_today23-11-2011 07:42:53

20 Tweet

142 Followers

50 Following

Hari Sadasivan PhD (@iamharisankar) 's Twitter Profile Photo

🌟1/3: Introducing mm2-gb, GPU-accelerated Minimap2 for long-read DNA mapping! 🔥 mm2-gb accelerates Minimap2's bottleneck (chaining) on GPUs without compromising accuracy. 🚀 Kudos to Xueshen Liu Joy Dong, Satish Narayanasamy & Gina Sitaraman Computer Science and Engineering at Michigan AMD Michigan Engineering

Horace He (@chhillee) 's Twitter Profile Photo

For too long, users have lived under the software lottery tyranny of fused attention implementations. No longer. Introducing FlexAttention, a new PyTorch API allowing for many attention variants to enjoy fused kernels in a few lines of PyTorch. pytorch.org/blog/flexatten… 1/10

For too long, users have lived under the software lottery tyranny of fused attention implementations. 

No longer. 

Introducing FlexAttention, a new PyTorch API allowing for many attention variants to enjoy fused kernels in a few lines of PyTorch.
pytorch.org/blog/flexatten…
1/10
Joy Dong (@joychew_d) 's Twitter Profile Photo

Excited to announce PyTorch support for customizable score modification for attention kernels! Stay tuned for Chapter 2: Inference and GQA support🥳

Joy Dong (@joychew_d) 's Twitter Profile Photo

I'll be at ACM BCB'24 in Shenzhen, China from 22-15th Nov to present our work mm2-gb and how we boot minimap2 performance using GPUs. If you are running Minimap2, please check out mm2-gb if you haven't already. github.com/Minimap2onGPU/… DM me if you're around and willing to chat!

Joy Dong (@joychew_d) 's Twitter Profile Photo

Our preprint for FlexAttention is available on arxiv: arxiv.org/abs/2412.05496! check it out for more technical details on how flexattention works and how we optimized it.

Joy Dong (@joychew_d) 's Twitter Profile Photo

🚀 Excited to see FlexAttention used in real-world research! We’ve recently released a preprint on this—arxiv.org/abs/2412.05496 -- check it out for more details! We are writing a second blog to talk about how to use FlexAttention for inference. Stay tuned!

PyTorch (@pytorch) 's Twitter Profile Photo

FlexAttention’s decoding backend is now optimized for inference—supporting GQA, PagedAttention, nested jagged tensors, trainable biases, and more. Read our latest blog for performance tuning guidance and examples using torchtune and gpt-fast: 🔗 hubs.la/Q03ktGsH0 #PyTorch

FlexAttention’s decoding backend is now optimized for inference—supporting GQA, PagedAttention, nested jagged tensors, trainable biases, and more.

Read our latest blog for performance tuning guidance and examples using torchtune and gpt-fast:
🔗 hubs.la/Q03ktGsH0

#PyTorch
Joy Dong (@joychew_d) 's Twitter Profile Photo

Super excited to release FlexAttention for Inference with a decoding backend, GQA, PagedAttention, trainable bias and more! Meet us at the MLSys '25 conference in Santa Clara -- We will present FlexAttention on Wed May 14. #MLsys