Han Guo (@hanguo97) 's Twitter Profile
Han Guo

@hanguo97

PhD Student @MIT_CSAIL | Past: @LTIatCMU @MITIBMLab @UNCNLP, @SFResearch, @BaiduResearch | Machine Learning, NLP.

ID: 769279457387540480

linkhttp://han-guo.info calendar_today26-08-2016 21:04:50

2,2K Tweet

2,2K Followers

4,4K Following

Jack Cook (@jackcookjack) 's Twitter Profile Photo

Excited to announce that we'll get to present our work on modeling brain-like processing pathways at NeurIPS later this year! Check out our paper here: arxiv.org/abs/2506.02813

Excited to announce that we'll get to present our work on modeling brain-like processing pathways at NeurIPS later this year!

Check out our paper here: arxiv.org/abs/2506.02813
Manifest AI (@manifest__ai) 's Twitter Profile Photo

Today, we’re releasing Power Retention, a new architecture beyond Transformers. It enables LLMs to handle millions of tokens efficiently, unlocking long-context applications that were too costly before. manifestai.com/articles/relea…

Chenfeng_X (@chenfeng_x) 's Twitter Profile Photo

Happy to share that we have two papers got accepted by NeurIPS Conference 2025 as #Spotlight papers! 1. 👼Angles Don’t Lie: Unlocking Training-Efficient RL from a Model’s Own Signals TL;DR: Token angles—the model’s self-generated signals—can reveal how well it grasps the data. By

Happy to share that we have two papers got accepted by <a href="/NeurIPSConf/">NeurIPS Conference</a>  2025 as #Spotlight papers! 

1. 👼Angles Don’t Lie: Unlocking Training-Efficient RL from a Model’s Own Signals

TL;DR: Token angles—the model’s self-generated signals—can reveal how well it grasps the data. By
Horace He (@chhillee) 's Twitter Profile Photo

Modular Manifolds: managed metrics (ie: Muon) meets manifolds, making matrix magnitudes manageable Or M^11 as I like to call it. Check out this great post by Jeremy Bernstein! It introduces some cool new ideas but also doubles as a great intro to optimization beyond Adam.

Jaemin Cho (on faculty job market) (@jmin__cho) 's Twitter Profile Photo

Bifrost-1 is accepted to NeurIPS 2025! 🥳 We let MLLM and Diffusion model communicate with patch-level CLIP latents, creating native alignment as MLLMs speak the visual language they already know. This leads to greater training efficiency and preserves the MLLM's original

Tao Chen (@taochenshh) 's Twitter Profile Photo

🚀 New Product Release We're launching three new products: 🤖 Bimanual Manipulation Platform - Surface-mountable system designed for customization. Can be mounted on versatile surfaces to fit your specific needs. 🤲 Compliant Gripper - Features built-in vision 👁️ and tactile

Han Guo (@hanguo97) 's Twitter Profile Photo

Read the scaling book earlier (before recent updates), and learned a lot. Maybe it’s a good time to revisit some of these again!

Benjamin F Spector (@bfspector) 's Twitter Profile Photo

(1/8) We’re releasing an 8-GPU Llama-70B inference engine megakernel! Our megakernel supports arbitrary batch sizes, mixed prefill+decode, a paged KV cache, instruction pipelining, dynamic scheduling, interleaved communication, and more! On ShareGPT it’s 22% faster than SGLang.

(1/8) We’re releasing an 8-GPU Llama-70B inference engine megakernel! Our megakernel supports arbitrary batch sizes, mixed prefill+decode, a paged KV cache, instruction pipelining, dynamic scheduling, interleaved communication, and more! On ShareGPT it’s 22% faster than SGLang.
Tao Chen (@taochenshh) 's Twitter Profile Photo

🚀 Low-cost whole-body teleoperation device released for Vega robot! 🔓 Open-source 💰 Low-cost 🎒 Portable 🦾 Whole-body control See it live at our #CoRL booth! 🤖✨

Dan Alistarh (@dalistarh) 's Twitter Profile Photo

Introducing LLM.Q: Quantized LLM training in pure CUDA/C++! With LLM.Q, you can train your own LLM on consumer GPUs with natively quantized matmuls, on single workstations. No datacenter required. Inspired by Andrej Karpathy's llm.c, but natively quantized.

Simran Arora (@simran_s_arora) 's Twitter Profile Photo

Intra-sm, inter-sm, and cross-gpu overlapping within a single kernel for llama 70b! Rather than compilers to fuse ops, we explore using an interpreter. While interpreters are often considered slow, the large granularity of ML operations and low overheads of this impl. allow large

Danijar Hafner (@danijarh) 's Twitter Profile Photo

Excited to introduce Dreamer 4, an agent that learns to solve complex control tasks entirely inside of its scalable world model! 🌎🤖 Dreamer 4 pushes the frontier of world model accuracy, speed, and learning complex tasks from offline datasets. co-led with Wilson Yan

Jeremy Cohen (@deepcohen) 's Twitter Profile Photo

Even with full-batch gradients, DL optimizers defy classical optimization theory, as they operate at the *edge of stability.* With Alex Damian, we introduce "central flows": a theoretical tool to analyze these dynamics that makes accurate quantitative predictions on real NNs.

Mohit Bansal (@mohitban47) 's Twitter Profile Photo

🚨 Generalized Correctness Predictors: ➡️ LLMs have no better self-knowledge about their own correctness compared to other LLMs. ➡️ Instead we find that LLMs benefit from learning to predict the correctness (based on history) of many other models. ➡️ Training 1 GCM is strictly