Hanxu Hu (@huhanxu1) 's Twitter Profile
Hanxu Hu

@huhanxu1

1st Year PhD Student supervised by @RicoSennrich and @iatitov, Intern @MSFTResearch | Prev @EdinburghNLP | Interested in Language Models.

ID: 1508606336103325702

linkhttp://hanxuhu.github.io calendar_today29-03-2022 00:48:02

85 Tweet

135 Followers

336 Following

Christopher Manning (@chrmanning) 's Twitter Profile Photo

Re: “Every major breakthrough in AI has been American”: America does itself no favors when it overestimates its specialness. Yes, the center of the AI industry is the US (California!), but many of the breakthroughs of (neural, gradient-based) AI happened elsewhere: • LSTMs,

Omar Sanseviero (@osanseviero) 's Twitter Profile Photo

Everyone: DeepSeek just appeared out of nowhere! 😱 Me: - DeepSeek Coder in 2023 - MoE in Feb - Math in Feb - VL in March - V2 in May - Coder V2 in June - Prover in August - V2.5 in September - VL 2 in December - V3 in December They've consistently shipped for 1+ years 😁

Thomas Wolf (@thom_wolf) 's Twitter Profile Photo

Finally took time to go over Dario's essay on DeepSeek and export control and to be honest it was quite painful to read. And I say this as a great admirer of Anthropic and big user of Claude* The first half of the essay reads like a lengthy attempt to justify that closed-source

Hanxu Hu (@huhanxu1) 's Twitter Profile Photo

Are you still using MGSM to evaluate multilingual ability of LLMs? Check our BenchMAX paper! We curated a more comprehensive multilingual benchmark including reasoning, coding, long context and agent tasks!

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! Core components of NSA: • Dynamic hierarchical sparse strategy • Coarse-grained token compression • Fine-grained token selection 💡 With

🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference!

Core components of NSA:
• Dynamic hierarchical sparse strategy
• Coarse-grained token compression
• Fine-grained token selection

💡 With
Kimi.ai (@kimi_moonshot) 's Twitter Profile Photo

🚀 Introducing our new tech report: Muon is Scalable for LLM Training We found that Muon optimizer can be scaled up using the follow techniques: • Adding weight decay • Carefully adjusting the per-parameter update scale ✨ Highlights: • ~2x computational efficiency vs AdamW

🚀 Introducing our new tech report: Muon is Scalable for LLM Training

We found that Muon optimizer can be scaled up using the follow techniques: 
• Adding weight decay
• Carefully adjusting the per-parameter update scale

✨ Highlights:
• ~2x computational efficiency vs AdamW
Wenhao Zhu (@wenhao_nlp) 's Twitter Profile Photo

🎉 Excited to share “Generalizing from Short to Long: Effective Data Synthesis for Long-Context Instruction Tuning” 📄 (arxiv.org/pdf/2502.15592) We propose "context synthesis": instead of generating instructions from long texts, we synthesize contexts for instructions—drawing

🎉 Excited to share “Generalizing from Short to Long: Effective Data Synthesis for Long-Context Instruction Tuning” 📄 (arxiv.org/pdf/2502.15592)

We propose "context synthesis": instead of generating instructions from long texts, we synthesize contexts for instructions—drawing
Scale ML (@scaleml) 's Twitter Profile Photo

We are excited to have Songlin Yang present: Linear Attention and Beyond 🚀🚀🚀 Time: Mar 5, 4pm EST, sign up at scale-ml.org to join our mailing list for the zoom link.

We are excited to have <a href="/SonglinYang4/">Songlin Yang</a> present: 
Linear Attention and Beyond 🚀🚀🚀

Time: Mar 5, 4pm EST, sign up at scale-ml.org to join our mailing list for the zoom link.
Quanta Magazine (@quantamagazine) 's Twitter Profile Photo

Andrew Barto and Richard Sutton have won the A.M. Turing Award for developing the theoretical foundations of reinforcement learning, a key method behind many major breakthroughs in artificial intelligence. 🧵

Andrew Barto and Richard Sutton have won the A.M. Turing Award for developing the theoretical foundations of reinforcement learning, a key method behind many major breakthroughs in artificial intelligence. 🧵
Slator (@slatornews) 's Twitter Profile Photo

University of Zurich and Huawei researchers explore how #LLMs can improve document-level 📄 #AI #translation by preserving context across segments 🔁 and integrating additional knowledge layers 💡 University of Zurich Universität Zürich Zurich Computational Linguistics Group Hanxu Hu Jannis Vamvas Rico Sennrich slator.com/how-large-lang…

Hanxu Hu (@huhanxu1) 's Twitter Profile Photo

Check out our new reward model calibration paper! We use Elo scores from ChatbotArena to calibrate RMs and mitigate the over-valuation problems of reward models. Really nice to work with my friends Xiao Zhu Chenmien Tan Pinzhen "Patrick" Chen and my phd supervisor Rico Sennrich !

Richard Sutton (@richardssutton) 's Twitter Profile Photo

This thread in Chinese does indeed seem to accurately communicate the main points of David Silver’s and my short paper on the Era of Experience. Thanks xingxb!

Edoardo Ponti (@pontiedoardo) 's Twitter Profile Photo

To appear at #NAACL2025 (2 orals, 1 poster)! Coleman Haley: which classes of words are most grounded on (perceptual proxies of) meaning? Uri Berger: how do image descriptions vary across languages and cultures? Hanxu Hu: can LLMs follow sequential instructions? 🧵below