Walter Hernandez (@walterhernandez) Twitter Tweets • TwiCopy

Kimi.ai

7 months ago

🚀 Introducing our new tech report: Muon is Scalable for LLM Training We found that Muon optimizer can be scaled up using the follow techniques: • Adding weight decay • Carefully adjusting the per-parameter update scale ✨ Highlights: • ~2x computational efficiency vs AdamW

thumb_up_off_alt1,1K

chat_bubble_outline83

repeat276

shareShare

MIT CSAIL

@mit_csail

2 months ago

"Controlling complexity is the essence of computer programming." — Brian Kernighan

thumb_up_off_alt455

chat_bubble_outline18

repeat69

shareShare

Dr. Julie Gurner

@drgurner

2 months ago

Justine Moore I know the human brain quite well, and could not disagree more. LLMs repeatedly prove themselves (thus far) of being unable to do the things brains do. ex: LLMs operate linearly & struggle integrating unrelated and varied experiences & info to form completely new things.

thumb_up_off_alt166

chat_bubble_outline22

repeat8

shareShare

Gergely Orosz

@gergelyorosz

2 months ago

AI tools will reduce the need for software engineers the same way that no-code tools reduced this. Being able to specify what software you want to build, how it should be structured, and how *exactly* it should work is... programming. And getting into the weeds, when needed.

thumb_up_off_alt1,1K

chat_bubble_outline73

repeat121

shareShare

Caleb

@calebfahlgren

2 months ago

NEW 🔥!! There's can now view JSON for List cells on Hugging Face datasets. Now there's no excuse for looking at your data! 🫣

NEW 🔥!! There's can now view JSON for List cells on <a href="/huggingface/">Hugging Face</a> datasets.

Now there's no excuse for looking at your data! 🫣

thumb_up_off_alt52

chat_bubble_outline9

repeat10

shareShare

Daily Dose of Data Science

@dailydoseofds_

2 months ago

Transformer vs. Mixture of Experts in LLMs, visually explained:

thumb_up_off_alt1,1K

chat_bubble_outline15

repeat282

shareShare

Sebastian Raschka

@rasbt

2 months ago

Kimi K2 is basically DeepSeek V3 but with fewer heads and more experts:

thumb_up_off_alt4,4K

chat_bubble_outline80

repeat514

shareShare

Valeriy M., PhD, MBA, CQF

@predict_addict

2 months ago

Finance bros discover what machine learning researchers like Vapnik and Chervonenkis were studying in 1950s. #machinealearning #finance

thumb_up_off_alt118

chat_bubble_outline2

repeat11

shareShare

Ethan Mollick

@emollick

2 months ago

I am starting to think sycophancy is going to be a bigger problem than pure hallucination as LLMs improve. Models that won’t tell you directly when you are wrong (and justify your correctness) are ultimately more dangerous to decision-making than models that are sometimes wrong.

thumb_up_off_alt3,3K

chat_bubble_outline217

repeat433

shareShare

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)

@rao2z

2 months ago

Both LLMs and LRMs are upper bounded by humanity's knowledge closure. True scientific discoveries are, by definition, outside of that closure. Ergo, LLMs/LRMs are great force multipliers to us; but don't support "Nobel this weekend" hype.. (🧵 from yesterday 👇👇👇)

thumb_up_off_alt146

chat_bubble_outline7

repeat21

shareShare

Ravid Shwartz Ziv

@ziv_ravid

2 months ago

So, all the models underperform humans on the new International Mathematical Olympiad questions, and Grok-4 is especially bad on it, even with best-of-n selection? Unbelievable!

thumb_up_off_alt2,2K

chat_bubble_outline150

repeat185

shareShare

Derek Thompson

@dkthomp

2 months ago

Yes. Writing is not a second thing that happens after thinking. The act of writing is an act of thinking. Writing *is* thinking. Students, academics, and anyone else who outsources their writing to LLMs will find their screens full of words and their minds emptied of thought.

thumb_up_off_alt29,29K

chat_bubble_outline515

repeat6,6K

shareShare

Paul Graham

@paulg

2 months ago

If you want to start a software startup, you should still learn to program. Even if AI writes most of your code, you'll still be in the position of an engineering manager, and to be a good engineering manager you have to be a programmer yourself.

thumb_up_off_alt8,8K

chat_bubble_outline285

repeat861

shareShare

Santiago

@svpino

2 months ago

Literally nobody knows what an agent is. I've seen many people referring to applications as "agents" as long as they use an LLM. Then we have those who talk about "agentic systems" and "agentic workflows." If you ask what they mean, they will start stuttering.

thumb_up_off_alt677

chat_bubble_outline121

repeat49

shareShare

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)

@rao2z

2 months ago

PSA: Don't say "Longer thinking kills performance" when you mean "Length of intermediate token string is not correlated with final accuracy"

thumb_up_off_alt228

chat_bubble_outline8

repeat35

shareShare

Andriy Burkov

@burkov

2 months ago

On the time savings with using LLM for coding: On one hand, you can code in 5 minutes what would take a day by hand. On the other hand, you can spend 2 days fixing a bug that would take only 5 minutes by hand.

thumb_up_off_alt512

chat_bubble_outline48

repeat57

shareShare

Pedro Domingos

@pmddomingos

2 months ago

It's no mystery why LLMs can learn in context. They're just doing nearest neighbor on the manifold learned by pretraining. See: arxiv.org/abs/2012.00152

thumb_up_off_alt768

chat_bubble_outline18

repeat88

shareShare

clem 🤗

@clementdelangue

2 months ago

When you realize that open-source is at the frontier of AI despite: - less GPUs - less money - less public and policy support - no $100M salaries to attract talent - with closed-source taking advantage and copying all the innovations of open-source without contributing back

thumb_up_off_alt1,1K

chat_bubble_outline87

repeat166

shareShare

Andriy Burkov

@burkov

2 months ago

We have a very poor understanding of why deep neural networks like transformer models learn the parameters they learn. For example, in the paper below from 2013, the authors demonstrated that 5% of the weights of a trained deep neural network can be used to predict the values of

thumb_up_off_alt685

chat_bubble_outline31

repeat103

shareShare

Pedro Domingos

@pmddomingos

2 months ago

LLMs give the illusion of intelligence like pixels on a screen give the illusion of a continuous image.

thumb_up_off_alt1,1K

chat_bubble_outline157

repeat171

shareShare