Federico Cassano (@ellev3n11) 's Twitter Profile
Federico Cassano

@ellev3n11

training agents @cursor_ai

prev @neu_prl, @scale_AI, @Roblox, @trailofbits

ID: 1300805399508066305

linkhttps://federico.codes/ calendar_today01-09-2020 14:39:24

299 Tweet

1,1K Followers

194 Following

Federico Cassano (@ellev3n11) 's Twitter Profile Photo

has someone built a better pytorch memory_viz? pytorch.org/memory_viz this one crashes with large snapshots and has weird UI bugs that hide the stack trace

Aman Sanger (@amanrsanger) 's Twitter Profile Photo

Cursor writes almost 1 billion lines of accepted code a day. To put it in perspective, the entire world produces just a few billion lines a day.

Prime Intellect (@primeintellect) 's Twitter Profile Photo

Introducing PCCL, the Prime Collective Communications Library — a low-level communication library built for decentralized training over the public internet, with fault tolerance as a core design principle. In testing, PCCL achieves up to 45 Gbit/s of bandwidth across datacenters

Federico Cassano (@ellev3n11) 's Twitter Profile Photo

not bullish on the diffusion models. they are much more expensive to train; only give benefits on decode speed. the GB200 NVL72 + distributed GEMMs + speculation will just solve decode bottleneck for big AR models.

Federico Cassano (@ellev3n11) 's Twitter Profile Photo

BugBot has saved me from so many bugs that missed human review, people should try it! I found it to be especially useful for figuring out buggy edge cases in complex logic.

Federico Cassano (@ellev3n11) 's Twitter Profile Photo

careful in updating transformers. the new version puts the chat template in some new file in the model directory, not in tokenizer_config.json; big breaking change

tenderizzation (@tenderizzation) 's Twitter Profile Photo

the four blackwells in a GB200 node when the CPU isn’t bothering them (they’re all replaying CUDA graph captures)