Eric Hartford (@theerichartford) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

🚀 DeepSeek-R1 is here! ⚡ Performance on par with OpenAI-o1 📖 Fully open-source model & technical report 🏆 MIT licensed: Distill & commercialize freely! 🌐 Website & API are live now! Try DeepThink at chat.deepseek.com today! 🐋 1/n

thumb_up_off_alt37,37K

chat_bubble_outline2,2K

repeat7,7K

shareShare

Awni Hannun

@awnihannun

7 months ago

DeepSeek R1 671B running on 2 M2 Ultras faster than reading speed. Getting close to open-source O1, at home, on consumer hardware. With mlx.distributed and mlx-lm, 3-bit quantization (~4 bpw)

thumb_up_off_alt5,5K

chat_bubble_outline128

repeat624

shareShare

Eric Hartford

@theerichartford

7 months ago

I'm selling my Cybertruck, I want a Prius Prime instead. (TBH if they made a plug-in hybrid Mustang Mach-E that's what I'd get.)

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Andrej Karpathy

@karpathy

6 months ago

I don't have too too much to add on top of this earlier post on V3 and I think it applies to R1 too (which is the more recent, thinking equivalent). I will say that Deep Learning has a legendary ravenous appetite for compute, like no other algorithm that has ever been developed

thumb_up_off_alt14,14K

chat_bubble_outline381

repeat2,2K

shareShare

N8 Programs

@n8programs

6 months ago

reading a deepseek paper and stumbled upon a very beautiful formula where they unify SFT and MOST RL TYPES (DPO, PPO, GRPO, etc.) into ONE FORMULA* *that requires additional reward functions to be defined. But the fundamental insight - that all these training methods can be

thumb_up_off_alt2,2K

chat_bubble_outline31

repeat436

shareShare

Eric Hartford

@theerichartford

6 months ago

thewatchinghawk Teknium (e/λ) Stealing means taking something that doesn't belong to you. OpenAI's ToS explicitly states that the output belongs to you. If they were the first party consumer of the API, then they violated a ToS contract. erichartford.com/demystifying-o…

thumb_up_off_alt5

chat_bubble_outline3

repeat1

shareShare

Thinking Machines

@thinkymachines

6 months ago

Today, we are excited to announce Thinking Machines Lab (thinkingmachines.ai), an artificial intelligence research and product company. We are scientists, engineers, and builders behind some of the most widely used AI products and libraries, including ChatGPT,

thumb_up_off_alt4,4K

chat_bubble_outline300

repeat513

shareShare

Hunyuan

@tencenthunyuan

5 months ago

🚀 Introducing Hunyuan-T1! 🌟 Meet Hunyuan-T1, the latest breakthrough in AI reasoning! Powered by Hunyuan TurboS, it's built for speed, accuracy, and efficiency. 🔥 ✅ Hybrid-Mamba-Transformer MoE Architecture – The first of its kind for ultra-large-scale reasoning ✅ Strong

thumb_up_off_alt1,1K

chat_bubble_outline74

repeat260

shareShare

Tri Dao

@tri_dao

5 months ago

Top-tier reasoning model w Mamba hybrid arch to make inference go brrr

thumb_up_off_alt184

chat_bubble_outline3

repeat56

shareShare

Tri Dao

@tri_dao

5 months ago

Very strong 8B and 56B Mamba hybrid models trained to 20T tokens, on 6K H100s, with FP8! This answers many of the open questions since we started working on Mamba: high quality, large scale, long context, multimodal & low precision. Props to the NVIDIA ADLR team!

thumb_up_off_alt347

chat_bubble_outline6

repeat57

shareShare

Prasanna S

@myprasanna

4 months ago

My name is Prasanna, who previously founded Rippling (worth $10B); I'm going through a divorce. I'm now on the run from the Chennai police hiding outside of Tamil Nadu. This is my story.

thumb_up_off_alt78,78K

chat_bubble_outline3,3K

repeat18,18K

shareShare

mansin

@mankaran32

4 months ago

Sim2real is somewhere starting to work for 12v servos. The way it balances is cool

thumb_up_off_alt578

chat_bubble_outline12

repeat31

shareShare

anirudh

@kamathematic

4 months ago

We evaluated EVERY LLM under the sun (including Llama 4) with Stagehand The results are fascinating: what LLMs can actually consistently parse deeply nested structured data like a DOM/a11y tree? Check out our full blog post in 🧵

thumb_up_off_alt598

chat_bubble_outline36

repeat40

shareShare

Jeremy Howard

@jeremyphoward

2 months ago

This is a really fun lesson btw :)

thumb_up_off_alt84

chat_bubble_outline4

repeat6

shareShare

"Gkid" Grime Grown kid with a grown kid of his own

@digbysharples

a month ago

Thank you Eric Hartford for the awesome Dolphin Mixtral. it like pushed itself to claim the name of my storytelling narrator agent in the stack of narrators ive built up ..

thumb_up_off_alt8

chat_bubble_outline2

repeat3

shareShare

Eric Hartford

@cognitivecompai

a month ago

clem 🤗 I love the long context recipe!!!

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

Eric Hartford

@theerichartford

25 days ago

I'm Eric Hartford, and I approve (and wrote) this content :-)

thumb_up_off_alt10

chat_bubble_outline0

repeat0

shareShare

Wing Lian (caseus)

@winglian

23 days ago

The current state of the ecosystem for post-training using GRPO w/ vllm + flash attention is frustratingly brittle. - The most recent vllm only supports PyTorch==2.7.0 - vllm requires xformers, but specifically only v0.0.30 is supported for torch 2.7.0. Any prior version of

thumb_up_off_alt282

chat_bubble_outline11

repeat17

shareShare

Eric Hartford

@theerichartford

22 days ago

Trump is making himself look guilty.

thumb_up_off_alt3

chat_bubble_outline1

repeat0

shareShare

Eric Hartford

@theerichartford

14 days ago

Wow, Uber. And why couldn't you have simply enabled the feature equally for both genders? You wanted to make a special point of hating males, in particular?

thumb_up_off_alt6

chat_bubble_outline2

repeat0

shareShare