chen zhuoming (@chenzhuoming911) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

🤯🥳 Thrilled to see our MagicPIG (lsh-ai.com) inspiring DeepSeek Native Sparse Attention design! We believe #sparsity is the key to scaling next-gen intelligence—from model parameters and contextual memory to lightning-fast inference. Instead of brute-force

thumb_up_off_alt115

chat_bubble_outline2

repeat11

shareShare

Beidi Chen

@beidichen

5 months ago

🚀 so excited to see industry releases of longcontext solutions!! Curious to see if attention alternatives like DeepSeek NSA, Kimi.ai MoBA, and Qwen -Max 1M can truly reason over million-token contexts while capturing sparse relationships in noisy data? Our

thumb_up_off_alt43

chat_bubble_outline0

repeat6

shareShare

chen zhuoming

@chenzhuoming911

3 months ago

Interesting work

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

CMU School of Computer Science

@scsatcmu

3 months ago

Huge thank you to NVIDIA Data Center for gifting a brand new #NVIDIADGX B200 to CMU’s Catalyst Research Group! This AI supercomputing system will afford Catalyst the ability to run and test their work on a world-class unified AI platform.

thumb_up_off_alt143

chat_bubble_outline3

repeat29

shareShare

chen zhuoming

@chenzhuoming911

3 months ago

🚨 Thrilled to present our Spotlight at #ICLR2025: "MagicPIG: LSH Sampling for Efficient LLM Generation" by Ranajoy Sadhukhan 🎉 💡 MagicPIG enables KV compression for long-context LLMs — where top-k falls short, sampling shines. ⚙️ Introduces CPU-GPU heterogeneous serving to boost

🚨 Thrilled to present our Spotlight at #ICLR2025:
"MagicPIG: LSH Sampling for Efficient LLM Generation" by <a href="/RJ_Sadhukhan/">Ranajoy Sadhukhan</a> 🎉
💡 MagicPIG enables KV compression for long-context LLMs — where top-k falls short, sampling shines.
⚙️ Introduces CPU-GPU heterogeneous serving to boost

thumb_up_off_alt10

chat_bubble_outline1

repeat5

shareShare

chen zhuoming

@chenzhuoming911

3 months ago

🚀 Thrilled to present MagicDec: Breaking the Latency–Throughput Tradeoff for Long Context Generation with Speculative Decoding at #ICLR2025! We break the long-standing inefficacy of speculative decoding — enabling ⚡ Lower latency 📈 Higher throughput 🔥 Bigger speedups at

thumb_up_off_alt13

chat_bubble_outline0

repeat3

shareShare

chen zhuoming

@chenzhuoming911

3 months ago

Very beautiful work and poster

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Xinle Cheng

@xinlec295

2 months ago

⚡ 30+ FPS video generation is HERE! 💡 Our Next-Frame Diffusion (NFD) achieves 30+ FPS autoregressive video generation on an A100 GPU with SOTA quality! Try the demo: nextframed.github.io Also huge thanks to Tianyu He

thumb_up_off_alt8

chat_bubble_outline0

repeat3

shareShare

Infini-AI-Lab

@infiniailab

a month ago

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: multiverse4fm.github.io 🧵 1/n

thumb_up_off_alt207

chat_bubble_outline2

repeat76

shareShare

Infini-AI-Lab

@infiniailab

a month ago

🚀 Excited to introduce our latest work GRESO: a method that identifies and skips millions of uninformative prompts before rollout and achieves up to 2.0x wall-clock time speedup in training. More rollouts lead to better model performance, but they’re also a major bottleneck in

thumb_up_off_alt163

chat_bubble_outline1

repeat31

shareShare

chen zhuoming

@chenzhuoming911

a month ago

Amazing Work!

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

chen zhuoming

@chenzhuoming911

16 days ago

Actually, a very useful functionality. When I evaluate AIME for the first time, it takes me two to three days to find a repository that gives a correct result. Some evaluations will take Huggingface several days to run, and SGLang/vLLM is just too complicated (though faster),

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

chen zhuoming

@chenzhuoming911

14 days ago

Good idea to work

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

chen zhuoming

@chenzhuoming911

11 days ago

Welcome to our presentations

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

chen zhuoming

@chenzhuoming911

2 days ago

This is my very first time to win a real paper award! Thanks to On-Device Learning for Foundation Models @ICML 25 ! I hold a belief that sparsity will enable a real AGI accessible to everyone before it becomes a circus within the 1M+ GPU cluster.

thumb_up_off_alt7

chat_bubble_outline0

repeat0

shareShare