Hanxu Hu (@huhanxu1) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Edoardo Ponti

@pontiedoardo

6 months ago

It's raining conference decisions, congrats to the first authors Zeyu Huang@ICLR 2025 Hanxu Hu Uri Berger Coleman Haley and the rest of the team!

It's raining conference decisions, congrats to the first authors <a href="/ZeroyuHuang/">Zeyu Huang@ICLR 2025</a> <a href="/huhanxu1/">Hanxu Hu</a> <a href="/uriberger88/">Uri Berger</a> <a href="/colemanhaley22/">Coleman Haley</a> and the rest of the team!

thumb_up_off_alt36

chat_bubble_outline1

repeat6

shareShare

Re: “Every major breakthrough in AI has been American”: America does itself no favors when it overestimates its specialness. Yes, the center of the AI industry is the US (California!), but many of the breakthroughs of (neural, gradient-based) AI happened elsewhere: • LSTMs,

thumb_up_off_alt2,2K

chat_bubble_outline75

repeat341

shareShare

Omar Sanseviero

@osanseviero

6 months ago

Everyone: DeepSeek just appeared out of nowhere! 😱 Me: - DeepSeek Coder in 2023 - MoE in Feb - Math in Feb - VL in March - V2 in May - Coder V2 in June - Prover in August - V2.5 in September - VL 2 in December - V3 in December They've consistently shipped for 1+ years 😁

thumb_up_off_alt3,3K

chat_bubble_outline58

repeat337

shareShare

Hieu Pham

@hyhieu226

6 months ago

OpenAI accusing DeepSeek of "copying" from ChatGPT, and Dario's call for export control, are the pinnacle of coping.

thumb_up_off_alt2,2K

chat_bubble_outline124

repeat206

shareShare

Thomas Wolf

@thom_wolf

6 months ago

Finally took time to go over Dario's essay on DeepSeek and export control and to be honest it was quite painful to read. And I say this as a great admirer of Anthropic and big user of Claude* The first half of the essay reads like a lengthy attempt to justify that closed-source

thumb_up_off_alt2,2K

chat_bubble_outline111

repeat502

shareShare

Hanxu Hu

@huhanxu1

6 months ago

Are you still using MGSM to evaluate multilingual ability of LLMs? Check our BenchMAX paper! We curated a more comprehensive multilingual benchmark including reasoning, coding, long context and agent tasks!

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

DeepSeek

@deepseek_ai

6 months ago

🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! Core components of NSA: • Dynamic hierarchical sparse strategy • Coarse-grained token compression • Fine-grained token selection 💡 With

thumb_up_off_alt16,16K

chat_bubble_outline901

repeat2,2K

shareShare

Kimi.ai

@kimi_moonshot

5 months ago

🚀 Introducing our new tech report: Muon is Scalable for LLM Training We found that Muon optimizer can be scaled up using the follow techniques: • Adding weight decay • Carefully adjusting the per-parameter update scale ✨ Highlights: • ~2x computational efficiency vs AdamW

thumb_up_off_alt1,1K

chat_bubble_outline83

repeat276

shareShare

Wenhao Zhu

@wenhao_nlp

5 months ago

🎉 Excited to share “Generalizing from Short to Long: Effective Data Synthesis for Long-Context Instruction Tuning” 📄 (arxiv.org/pdf/2502.15592) We propose "context synthesis": instead of generating instructions from long texts, we synthesize contexts for instructions—drawing

thumb_up_off_alt75

chat_bubble_outline1

repeat21

shareShare

Scale ML

@scaleml

5 months ago

We are excited to have Songlin Yang present: Linear Attention and Beyond 🚀🚀🚀 Time: Mar 5, 4pm EST, sign up at scale-ml.org to join our mailing list for the zoom link.

thumb_up_off_alt37

chat_bubble_outline0

repeat11

shareShare

Quanta Magazine

@quantamagazine

5 months ago

Andrew Barto and Richard Sutton have won the A.M. Turing Award for developing the theoretical foundations of reinforcement learning, a key method behind many major breakthroughs in artificial intelligence. 🧵

thumb_up_off_alt2,2K

chat_bubble_outline31

repeat419

shareShare

Graham Neubig

@gneubig

5 months ago

This paper had 9000 citations and is still underrated imo arxiv.org/abs/1508.07909

thumb_up_off_alt120

chat_bubble_outline5

repeat12

shareShare

Slator

@slatornews

4 months ago

University of Zurich and Huawei researchers explore how #LLMs can improve document-level 📄 #AI #translation by preserving context across segments 🔁 and integrating additional knowledge layers 💡 University of Zurich Universität Zürich Zurich Computational Linguistics Group Hanxu Hu Jannis Vamvas Rico Sennrich slator.com/how-large-lang…

thumb_up_off_alt3

chat_bubble_outline0

repeat3

shareShare

Hanxu Hu

@huhanxu1

4 months ago

Check out our new reward model calibration paper! We use Elo scores from ChatbotArena to calibrate RMs and mitigate the over-valuation problems of reward models. Really nice to work with my friends Xiao Zhu Chenmien Tan Pinzhen "Patrick" Chen and my phd supervisor Rico Sennrich !

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Richard Sutton

@richardssutton

4 months ago

This thread in Chinese does indeed seem to accurately communicate the main points of David Silver’s and my short paper on the Era of Experience. Thanks xingxb!

thumb_up_off_alt278

chat_bubble_outline7

repeat40

shareShare

Edoardo Ponti

@pontiedoardo

3 months ago

To appear at #NAACL2025 (2 orals, 1 poster)! Coleman Haley: which classes of words are most grounded on (perceptual proxies of) meaning? Uri Berger: how do image descriptions vary across languages and cultures? Hanxu Hu: can LLMs follow sequential instructions? 🧵below

thumb_up_off_alt36

chat_bubble_outline1

repeat6

shareShare

Hanxu Hu

Gate.io

Edoardo Ponti

Christopher Manning

Omar Sanseviero

Hieu Pham

Thomas Wolf

Hanxu Hu

DeepSeek

Kimi.ai

Wenhao Zhu

Scale ML

Quanta Magazine

Graham Neubig

Slator

Hanxu Hu

Richard Sutton

Edoardo Ponti