Eric Hartford (@theerichartford) 's Twitter Profile
Eric Hartford

@theerichartford

Principal Applied AI Researcher

ID: 1798880282474385408

calendar_today07-06-2024 00:52:02

249 Tweet

697 Followers

71 Following

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

๐Ÿš€ DeepSeek-R1 is here! โšก Performance on par with OpenAI-o1 ๐Ÿ“– Fully open-source model & technical report ๐Ÿ† MIT licensed: Distill & commercialize freely! ๐ŸŒ Website & API are live now! Try DeepThink at chat.deepseek.com today! ๐Ÿ‹ 1/n

๐Ÿš€ DeepSeek-R1 is here!

โšก Performance on par with OpenAI-o1
๐Ÿ“– Fully open-source model & technical report
๐Ÿ† MIT licensed: Distill & commercialize freely!

๐ŸŒ Website & API are live now! Try DeepThink at chat.deepseek.com today!

๐Ÿ‹ 1/n
Awni Hannun (@awnihannun) 's Twitter Profile Photo

DeepSeek R1 671B running on 2 M2 Ultras faster than reading speed. Getting close to open-source O1, at home, on consumer hardware. With mlx.distributed and mlx-lm, 3-bit quantization (~4 bpw)

Eric Hartford (@theerichartford) 's Twitter Profile Photo

I'm selling my Cybertruck, I want a Prius Prime instead. (TBH if they made a plug-in hybrid Mustang Mach-E that's what I'd get.)

I'm selling my Cybertruck, I want a Prius Prime instead.  (TBH if they made a plug-in hybrid Mustang Mach-E that's what I'd get.)
Andrej Karpathy (@karpathy) 's Twitter Profile Photo

I don't have too too much to add on top of this earlier post on V3 and I think it applies to R1 too (which is the more recent, thinking equivalent). I will say that Deep Learning has a legendary ravenous appetite for compute, like no other algorithm that has ever been developed

N8 Programs (@n8programs) 's Twitter Profile Photo

reading a deepseek paper and stumbled upon a very beautiful formula where they unify SFT and MOST RL TYPES (DPO, PPO, GRPO, etc.) into ONE FORMULA* *that requires additional reward functions to be defined. But the fundamental insight - that all these training methods can be

reading a deepseek paper and stumbled upon a very beautiful formula where they unify SFT and MOST RL TYPES (DPO, PPO, GRPO, etc.) into ONE FORMULA*

*that requires additional reward functions to be defined.

But the fundamental insight - that all these training methods can be
Eric Hartford (@theerichartford) 's Twitter Profile Photo

thewatchinghawk Teknium (e/ฮป) Stealing means taking something that doesn't belong to you. OpenAI's ToS explicitly states that the output belongs to you. If they were the first party consumer of the API, then they violated a ToS contract. erichartford.com/demystifying-oโ€ฆ

Thinking Machines (@thinkymachines) 's Twitter Profile Photo

Today, we are excited to announce Thinking Machines Lab (thinkingmachines.ai), an artificial intelligence research and product company. We are scientists, engineers, and builders behind some of the most widely used AI products and libraries, including ChatGPT,

Hunyuan (@tencenthunyuan) 's Twitter Profile Photo

๐Ÿš€ Introducing Hunyuan-T1! ๐ŸŒŸ Meet Hunyuan-T1, the latest breakthrough in AI reasoning! Powered by Hunyuan TurboS, it's built for speed, accuracy, and efficiency. ๐Ÿ”ฅ โœ… Hybrid-Mamba-Transformer MoE Architecture โ€“ The first of its kind for ultra-large-scale reasoning โœ… Strong

๐Ÿš€ Introducing Hunyuan-T1! ๐ŸŒŸ

Meet Hunyuan-T1, the latest breakthrough in AI reasoning! Powered by Hunyuan TurboS, it's built for speed, accuracy, and efficiency. ๐Ÿ”ฅ

โœ… Hybrid-Mamba-Transformer MoE Architecture โ€“ The first of its kind for ultra-large-scale reasoning
โœ… Strong
Tri Dao (@tri_dao) 's Twitter Profile Photo

Very strong 8B and 56B Mamba hybrid models trained to 20T tokens, on 6K H100s, with FP8! This answers many of the open questions since we started working on Mamba: high quality, large scale, long context, multimodal & low precision. Props to the NVIDIA ADLR team!

Prasanna S (@myprasanna) 's Twitter Profile Photo

My name is Prasanna, who previously founded Rippling (worth $10B); I'm going through a divorce. I'm now on the run from the Chennai police hiding outside of Tamil Nadu. This is my story.

anirudh (@kamathematic) 's Twitter Profile Photo

We evaluated EVERY LLM under the sun (including Llama 4) with Stagehand The results are fascinating: what LLMs can actually consistently parse deeply nested structured data like a DOM/a11y tree? Check out our full blog post in ๐Ÿงต

We evaluated EVERY LLM under the sun (including Llama 4) with Stagehand

The results are fascinating: what LLMs can actually consistently parse deeply nested structured data like a DOM/a11y tree? 

Check out our full blog post in ๐Ÿงต
"Gkid" Grime Grown kid with a grown kid of his own (@digbysharples) 's Twitter Profile Photo

Thank you Eric Hartford for the awesome Dolphin Mixtral. it like pushed itself to claim the name of my storytelling narrator agent in the stack of narrators ive built up ..

Wing Lian (caseus) (@winglian) 's Twitter Profile Photo

The current state of the ecosystem for post-training using GRPO w/ vllm + flash attention is frustratingly brittle. - The most recent vllm only supports PyTorch==2.7.0 - vllm requires xformers, but specifically only v0.0.30 is supported for torch 2.7.0. Any prior version of

Eric Hartford (@theerichartford) 's Twitter Profile Photo

Wow, Uber. And why couldn't you have simply enabled the feature equally for both genders? You wanted to make a special point of hating males, in particular?