akira (@realmcore_) Twitter Tweets • TwiCopy

akira

9 months ago

A long time ago, a very smart researcher told me to look for the things unsaid to describe what the labs are doing. In this case, it seems their product team is truly building an agent, and by extension, we will finally see whether or not the app layer optionality thesis holds.

thumb_up_off_alt18

chat_bubble_outline0

repeat0

shareShare

Edward Z. Yang

@ezyang

9 months ago

I regret to inform you that once I have actually paged a codebase into my head, it is faster for me to make changes than it is to ask Claude to do it

thumb_up_off_alt1,1K

chat_bubble_outline56

repeat59

shareShare

akira

@realmcore_

9 months ago

interestingly, it may be volume of data rather than diversity due to multi-modality that makes human intelligence more compressive and therefore more "intelligent" when compared to deep neural nets.

thumb_up_off_alt2

chat_bubble_outline1

repeat0

shareShare

akira

@realmcore_

9 months ago

so glad anthropic released claude-sonnet-3-7-rate-limit

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

akira

@realmcore_

9 months ago

so we're getting gpt 4.5 today right?

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

akira

@realmcore_

9 months ago

to all wondering why the benchmark scores are lower than the o3 scores. please read the following:

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

akira

@realmcore_

9 months ago

I wouldn't be surprised if o3 is a distill off of an early 4.5 checkpoint with their reasoning/rl stack to finetune

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

akira

@realmcore_

9 months ago

kind of funny that everyone is making these images and freaking out when its pretty clear it was just trained on more svg data. Same goes for task variety. Theres a frontier of diverse tasks to eval on so of course every version should support increasing diversity. still cool

thumb_up_off_alt2

chat_bubble_outline1

repeat0

shareShare

akira

@realmcore_

9 months ago

sonnet 3.5 might be peak unfortunately

thumb_up_off_alt8

chat_bubble_outline0

repeat0

shareShare

akira

@realmcore_

9 months ago

Imagine if this was bait for anthropic to drop a swe-agent, then oai releases the full version of theirs to crush whatever ace anthropic has up their sleeve. Also possible that 4.5 is bait for new opus to open up space for 4.5 + reasoning as pro mode 'o4'

thumb_up_off_alt10

chat_bubble_outline0

repeat0

shareShare

akira

@realmcore_

9 months ago

Claude 3.7 for code is kind of terrible ~50% of the time. The code it writes feels like ancient legacy java and I'm not even using it for java.

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

akira

@realmcore_

9 months ago

I sort of agree with this but also don't. Pretraining will likely have to shift to a bunch of curated, useful data. We'll need grounded data. The current paradigm is pretty much filtered common crawl which I agree will not continue. Data efficiency go "brrrr" as they say

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

akira

@realmcore_

9 months ago

The main ideas behind practically ALL agent companies are: 1_ how you spend compute >>> how much compute you spend 2_ distribution >>> tech as n -> infinity if you don't have your own models, youre just running an arbitrage play over these two ideas

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

akira

@realmcore_

9 months ago

Every company in this picture is actually just claude-sonnet btw

thumb_up_off_alt15

chat_bubble_outline0

repeat0

shareShare

Dcai

@_dcai

8 months ago

A few days ago, we signed a lease for an absolutely beautiful office space in the heart of San Francisco. It's got high ceilings, hugeee windows, and a rooftop with a gorgeous view of the SF skyline. But this space isn't for us. It's not an office. The plan was never to just

thumb_up_off_alt147

chat_bubble_outline25

repeat12

shareShare

Lisan al Gaib

@scaling01

8 months ago

Sonnet 3.5 was a 1 in a billion run

thumb_up_off_alt1,1K

chat_bubble_outline52

repeat52

shareShare

Y Combinator

@ycombinator

6 months ago

Simplex (Simplex) builds developer-first web agents that companies use to integrate with legacy portals. They're already in production, dispatching freight shipments, downloading customers’ invoices, and fetching websites’ internal APIs. ycombinator.com/launches/NbM-s… Congrats on

thumb_up_off_alt137

chat_bubble_outline19

repeat30

shareShare

Morph

@morph_labs

6 months ago

We are excited to announce Trinity, an autoformalization system for verified superintelligence that we have developed at Morph. We have used it to automatically formalize in Lean a classical result of de Bruijn that the abc conjecture is true almost always.

We are excited to announce Trinity, an autoformalization system for verified superintelligence that we have developed at <a href="/morph_labs/">Morph</a>. We have used it to automatically formalize in Lean a classical result of de Bruijn that the abc conjecture is true almost always.

thumb_up_off_alt375

chat_bubble_outline10

repeat50

shareShare