Edward Kmett (@kmett) Twitter Tweets • TwiCopy

Edward Kmett

9 months ago

Since DeepSeek-V3-Base dropped on Christmas, a lot of folks have asked me why the models that DeepSeek builds are so good, why they were able to train it so cheaply and what this means. With DeepSeek-R1 and its distillations performing as well as they are, it is worth noting

thumb_up_off_alt277

chat_bubble_outline10

repeat36

shareShare

Edward Kmett

@kmett

9 months ago

I really don't understand the first-order panic reaction from analysts that is leading folks to short $NVDA because of DeepSeek's existence. This seems like an incredibly short-sighted reason to er.. short. The first wave of reporting was that DeepSeek-V3-Base demonstrated that

thumb_up_off_alt138

chat_bubble_outline11

repeat17

shareShare

Edward Kmett

@kmett

9 months ago

I do appreciate the discounted sale price being offered for this entire sector. Um, but, er.. if everyone is down, well, just out of curiosity, what do traders expect these models to be run on exactly?

thumb_up_off_alt21

chat_bubble_outline8

repeat0

shareShare

Edward Kmett

@kmett

9 months ago

I'm extremely excited to welcome on board our long-time friend Mitesh Agrawal (Mitesh) as our new CEO here at @Positron_AI. As we now switch into full bore production, I am personally switching over to "Founder/Chief Scientist", and Thomas Sohmers will be sliding over into the

thumb_up_off_alt45

chat_bubble_outline6

repeat3

shareShare

Mitesh

@mitesh711

9 months ago

I am super excited and we are already shipping!! DM me if interested in buying Positron AI racks. Hugging Face transformers run out of the box at >3x perf/$ and perf/watt today for <70B models of llama, gemma, phi and mistral (deepseek r1 soon - distilled models run today).

I am super excited and we are already shipping!! DM me if interested in buying <a href="/positron_ai/">Positron AI</a> racks. <a href="/huggingface/">Hugging Face</a> transformers run out of the box at >3x perf/$ and perf/watt today for <70B models of llama, gemma, phi and mistral (deepseek r1 soon - distilled models run today).

thumb_up_off_alt42

chat_bubble_outline5

repeat6

shareShare

David Pfau

@pfau

9 months ago

AI researcher here. This isn't cute. Neural networks only do this when they are extremely distressed.

thumb_up_off_alt920

chat_bubble_outline16

repeat28

shareShare

Zhengyao Jiang

@zhengyaojiang

9 months ago

Training LLMs with Reinforcement Learning (RL) isn’t a new idea. So why does it suddenly seem to be working now (o1/DeepSeek)? Here are a few theories and my thoughts on each of them: (1/N)

thumb_up_off_alt2,2K

chat_bubble_outline28

repeat293

shareShare

Positron AI

@positron_ai

9 months ago

Just 2 weeks after Mitesh joined us as CEO, we are very excited to announce that Positron AI has closed $23.5M in seed funding, with new investors including Valor, Atreides Management, LP, and Flume Ventures. businesswire.com/news/home/2025… Even more exciting announcements ahead!

Just 2 weeks after <a href="/mitesh711/">Mitesh</a> joined us as CEO, we are very excited to announce that Positron AI has closed $23.5M in seed funding, with new investors including <a href="/valor/">Valor</a>, Atreides Management, LP, and Flume Ventures.

businesswire.com/news/home/2025…

Even more exciting announcements ahead!

thumb_up_off_alt28

chat_bubble_outline1

repeat5

shareShare

Edward Kmett

@kmett

8 months ago

Based on extensive sampling of my social circle, I'm starting to doubt that these so-called "neurotypicals" exist.

thumb_up_off_alt111

chat_bubble_outline15

repeat6

shareShare

davidad 🎇

@davidad

8 months ago

Frog put the CoT in a stop_gradient() box. “There,” he said. “Now there will not be any optimization pressure on the CoT.” “But there is still selection pressure,” said Toad. “That is true,” said Frog.

thumb_up_off_alt914

chat_bubble_outline21

repeat73

shareShare

Edward Kmett

@kmett

7 months ago

I originally wrote this library to er.. re-answer a stack overflow question. It has since been used for everything from analyzing high energy physics data to machine learning to computer graphics to pick and place machines to keeping flying cars aloft with people in them. The

thumb_up_off_alt169

chat_bubble_outline3

repeat32

shareShare

Edward Kmett

@kmett

7 months ago

It seems the Disney live-action remake pipeline has finally made it to Pixels (2015). Bold choice not to have Adam Sandler reprise his role.

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Edward Kmett

@kmett

5 months ago

Today I used Sutherland's logical effort to reason about prefix adders. With it I found g (logical effort), p (parasitic delay) and h (load) applied them to paths for g (generate), p (propagate) and h (Ling-style pseudo-carries). No notational confusion ensued. *cough* None.

thumb_up_off_alt9

chat_bubble_outline0

repeat0

shareShare

Edward Kmett

@kmett

5 months ago

It is incredible just how disjoint the support for SystemVerilog language features is between, say, Verilator and Genus. Verilator: What is a parameterized function or type? The only way you can parameterize either of those is if you shove that in an interface and put the

thumb_up_off_alt20

chat_bubble_outline3

repeat1

shareShare

Edward Kmett

@kmett

5 months ago

Off to ZuriHac! See you there!

thumb_up_off_alt42

chat_bubble_outline1

repeat0

shareShare

Edward Kmett

@kmett

5 months ago

One problem with leaning so hard on closed models is that on some days they just take stupid pills. Maybe it is throttling, some kind of meta-level change in the way they do chain-of-thought, whatever. Who knows? The model can't tell me. Today it seems ChatGPT is "being the

thumb_up_off_alt58

chat_bubble_outline10

repeat2

shareShare

Edward Kmett

@kmett

5 months ago

I can't be the only one who gets this UI bug in ChatGPT almost all the time. It gets stuck thinking it's talking, but it has finished talking. I can't interrupt it with the stop button because it's not talking. And I can't respond because it thinks it is. The chat dies,

thumb_up_off_alt6

chat_bubble_outline3

repeat0

shareShare