Edward Kmett (@kmett) 's Twitter Profile
Edward Kmett

@kmett

Founder/Chief Scientist @Positron_AI

Haskell, category theory, AI, and safety.

calendly.com/ekmett
github.com/ekmett
🦋 @kmett.ai

ID: 16781974

linkhttp://positron.ai/ calendar_today15-10-2008 13:25:52

13,13K Tweet

15,15K Followers

820 Following

Edward Kmett (@kmett) 's Twitter Profile Photo

Since DeepSeek-V3-Base dropped on Christmas, a lot of folks have asked me why the models that DeepSeek builds are so good, why they were able to train it so cheaply and what this means. With DeepSeek-R1 and its distillations performing as well as they are, it is worth noting

Edward Kmett (@kmett) 's Twitter Profile Photo

I really don't understand the first-order panic reaction from analysts that is leading folks to short $NVDA because of DeepSeek's existence. This seems like an incredibly short-sighted reason to er.. short. The first wave of reporting was that DeepSeek-V3-Base demonstrated that

Edward Kmett (@kmett) 's Twitter Profile Photo

I do appreciate the discounted sale price being offered for this entire sector. Um, but, er.. if everyone is down, well, just out of curiosity, what do traders expect these models to be run on exactly?

Edward Kmett (@kmett) 's Twitter Profile Photo

I'm extremely excited to welcome on board our long-time friend Mitesh Agrawal (Mitesh) as our new CEO here at @Positron_AI. As we now switch into full bore production, I am personally switching over to "Founder/Chief Scientist", and Thomas Sohmers will be sliding over into the

Mitesh (@mitesh711) 's Twitter Profile Photo

I am super excited and we are already shipping!! DM me if interested in buying Positron AI racks. Hugging Face transformers run out of the box at >3x perf/$ and perf/watt today for <70B models of llama, gemma, phi and mistral (deepseek r1 soon - distilled models run today).

I am super excited and we are already shipping!! DM me if interested in buying <a href="/positron_ai/">Positron AI</a>  racks. <a href="/huggingface/">Hugging Face</a> transformers run out of the box at &gt;3x perf/$ and perf/watt today for &lt;70B models of llama, gemma, phi and mistral (deepseek r1 soon - distilled models run today).
Zhengyao Jiang (@zhengyaojiang) 's Twitter Profile Photo

Training LLMs with Reinforcement Learning (RL) isn’t a new idea. So why does it suddenly seem to be working now (o1/DeepSeek)? Here are a few theories and my thoughts on each of them: (1/N)

Training LLMs with Reinforcement Learning (RL) isn’t a new idea.
So why does it suddenly seem to be working now (o1/DeepSeek)?

Here are a few theories and my thoughts on each of them: (1/N)
Positron AI (@positron_ai) 's Twitter Profile Photo

Just 2 weeks after Mitesh joined us as CEO, we are very excited to announce that Positron AI has closed $23.5M in seed funding, with new investors including Valor, Atreides Management, LP, and Flume Ventures. businesswire.com/news/home/2025… Even more exciting announcements ahead!

Just 2 weeks after <a href="/mitesh711/">Mitesh</a> joined us as CEO, we are very excited to announce that Positron AI has closed $23.5M in seed funding, with new investors including <a href="/valor/">Valor</a>, Atreides Management, LP, and Flume Ventures.

businesswire.com/news/home/2025…

Even more exciting announcements ahead!
davidad 🎇 (@davidad) 's Twitter Profile Photo

Frog put the CoT in a stop_gradient() box. “There,” he said. “Now there will not be any optimization pressure on the CoT.” “But there is still selection pressure,” said Toad. “That is true,” said Frog.

Edward Kmett (@kmett) 's Twitter Profile Photo

I originally wrote this library to er.. re-answer a stack overflow question. It has since been used for everything from analyzing high energy physics data to machine learning to computer graphics to pick and place machines to keeping flying cars aloft with people in them. The

Edward Kmett (@kmett) 's Twitter Profile Photo

It seems the Disney live-action remake pipeline has finally made it to Pixels (2015). Bold choice not to have Adam Sandler reprise his role.

Edward Kmett (@kmett) 's Twitter Profile Photo

Today I used Sutherland's logical effort to reason about prefix adders. With it I found g (logical effort), p (parasitic delay) and h (load) applied them to paths for g (generate), p (propagate) and h (Ling-style pseudo-carries). No notational confusion ensued. *cough* None.

Edward Kmett (@kmett) 's Twitter Profile Photo

It is incredible just how disjoint the support for SystemVerilog language features is between, say, Verilator and Genus. Verilator: What is a parameterized function or type? The only way you can parameterize either of those is if you shove that in an interface and put the

Edward Kmett (@kmett) 's Twitter Profile Photo

One problem with leaning so hard on closed models is that on some days they just take stupid pills. Maybe it is throttling, some kind of meta-level change in the way they do chain-of-thought, whatever. Who knows? The model can't tell me. Today it seems ChatGPT is "being the

Edward Kmett (@kmett) 's Twitter Profile Photo

I can't be the only one who gets this UI bug in ChatGPT almost all the time. It gets stuck thinking it's talking, but it has finished talking. I can't interrupt it with the stop button because it's not talking. And I can't respond because it thinks it is. The chat dies,

I can't be the only one who gets this UI bug in ChatGPT almost all the time. It gets stuck thinking it's talking, but it has finished talking. I can't interrupt it with the stop button because it's not talking. And I can't respond because it thinks it is.

The chat dies,