Walter Hernandez (@walterhernandez) 's Twitter Profile
Walter Hernandez

@walterhernandez

ID: 144316574

calendar_today15-05-2010 23:33:11

4,4K Tweet

223 Followers

1,1K Following

Kimi.ai (@kimi_moonshot) 's Twitter Profile Photo

🚀 Introducing our new tech report: Muon is Scalable for LLM Training We found that Muon optimizer can be scaled up using the follow techniques: • Adding weight decay • Carefully adjusting the per-parameter update scale ✨ Highlights: • ~2x computational efficiency vs AdamW

🚀 Introducing our new tech report: Muon is Scalable for LLM Training

We found that Muon optimizer can be scaled up using the follow techniques: 
• Adding weight decay
• Carefully adjusting the per-parameter update scale

✨ Highlights:
• ~2x computational efficiency vs AdamW
Dr. Julie Gurner (@drgurner) 's Twitter Profile Photo

Justine Moore I know the human brain quite well, and could not disagree more. LLMs repeatedly prove themselves (thus far) of being unable to do the things brains do. ex: LLMs operate linearly & struggle integrating unrelated and varied experiences & info to form completely new things.

Gergely Orosz (@gergelyorosz) 's Twitter Profile Photo

AI tools will reduce the need for software engineers the same way that no-code tools reduced this. Being able to specify what software you want to build, how it should be structured, and how *exactly* it should work is... programming. And getting into the weeds, when needed.

AI tools will reduce the need for software engineers the same way that no-code tools reduced this.

Being able to specify what software you want to build, how it should be structured, and how *exactly* it should work is... programming. And getting into the weeds, when needed.
Ethan Mollick (@emollick) 's Twitter Profile Photo

I am starting to think sycophancy is going to be a bigger problem than pure hallucination as LLMs improve. Models that won’t tell you directly when you are wrong (and justify your correctness) are ultimately more dangerous to decision-making than models that are sometimes wrong.

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు) (@rao2z) 's Twitter Profile Photo

Both LLMs and LRMs are upper bounded by humanity's knowledge closure. True scientific discoveries are, by definition, outside of that closure. Ergo, LLMs/LRMs are great force multipliers to us; but don't support "Nobel this weekend" hype.. (🧵 from yesterday 👇👇👇)

Both LLMs and LRMs are upper bounded by humanity's knowledge closure. True scientific discoveries are, by definition, outside of that closure. Ergo, LLMs/LRMs are great force multipliers to us; but don't support "Nobel this weekend" hype.. (🧵 from yesterday 👇👇👇)
Ravid Shwartz Ziv (@ziv_ravid) 's Twitter Profile Photo

So, all the models underperform humans on the new International Mathematical Olympiad questions, and Grok-4 is especially bad on it, even with best-of-n selection? Unbelievable!

So, all the models underperform humans on the new International Mathematical Olympiad questions, and  Grok-4 is especially bad on it, even with best-of-n selection?  Unbelievable!
Derek Thompson (@dkthomp) 's Twitter Profile Photo

Yes. Writing is not a second thing that happens after thinking. The act of writing is an act of thinking. Writing *is* thinking. Students, academics, and anyone else who outsources their writing to LLMs will find their screens full of words and their minds emptied of thought.

Yes. 

Writing is not a second thing that happens after thinking. The act of writing is an act of thinking. Writing *is* thinking.

Students, academics, and anyone else who outsources their writing to LLMs will find their screens full of words and their minds emptied of thought.
Paul Graham (@paulg) 's Twitter Profile Photo

If you want to start a software startup, you should still learn to program. Even if AI writes most of your code, you'll still be in the position of an engineering manager, and to be a good engineering manager you have to be a programmer yourself.

Santiago (@svpino) 's Twitter Profile Photo

Literally nobody knows what an agent is. I've seen many people referring to applications as "agents" as long as they use an LLM. Then we have those who talk about "agentic systems" and "agentic workflows." If you ask what they mean, they will start stuttering.

Andriy Burkov (@burkov) 's Twitter Profile Photo

On the time savings with using LLM for coding: On one hand, you can code in 5 minutes what would take a day by hand. On the other hand, you can spend 2 days fixing a bug that would take only 5 minutes by hand.

Pedro Domingos (@pmddomingos) 's Twitter Profile Photo

It's no mystery why LLMs can learn in context. They're just doing nearest neighbor on the manifold learned by pretraining. See: arxiv.org/abs/2012.00152

clem 🤗 (@clementdelangue) 's Twitter Profile Photo

When you realize that open-source is at the frontier of AI despite: - less GPUs - less money - less public and policy support - no $100M salaries to attract talent - with closed-source taking advantage and copying all the innovations of open-source without contributing back

Andriy Burkov (@burkov) 's Twitter Profile Photo

We have a very poor understanding of why deep neural networks like transformer models learn the parameters they learn. For example, in the paper below from 2013, the authors demonstrated that 5% of the weights of a trained deep neural network can be used to predict the values of

We have a very poor understanding of why deep neural networks like transformer models learn the parameters they learn. For example, in the paper below from 2013, the authors demonstrated that 5% of the weights of a trained deep neural network can be used to predict the values of