λndres Mariscal (@serialdev) 's Twitter Profile
λndres Mariscal

@serialdev

Wrote anti-cheat ml, do ML/AI at places you know off and probably use && into graphics||compilers||DBs
I like tech, sloths and, chihuahuas.

ID: 3386323594

calendar_today21-07-2015 19:45:41

3,3K Tweet

328 Followers

3,3K Following

Kostas Anagnostou (@kostasaaa) 's Twitter Profile Photo

Worth a read: "How do I become a graphics programmer? - A small guide from the AMD Game Engineering team": gpuopen.com/learn/how_do_y…

Pavan Jayasinha (@pavanjayasinha) 's Twitter Profile Photo

I implemented an LLM end-to-end in hardware, and ran it on an FPGA. Zero Python. Zero CUDA. Just pure SysVerilog. All my progress + everything I learned from 200h of LLM chip design (demo at the end)👇

I implemented an LLM end-to-end in hardware, and ran it on an FPGA.

Zero Python. Zero CUDA. Just pure SysVerilog.

All my progress + everything I learned from 200h of LLM chip design (demo at the end)👇
Deedy (@deedydas) 's Twitter Profile Photo

DeepSeek just dropped the single best end-to-end paper on large model training. It covers — Software (MLA, training in FP8, DeepEP, LogFMT) — Hardware (Multi-Rail Fat Tree, Ethernet RoCE switches) — Mix (IBGDA, 3FS filesystem) DeepSeek's engineering depth is insane. Must read.

DeepSeek just dropped the single best end-to-end paper on large model training.

It covers
— Software (MLA, training in FP8, DeepEP, LogFMT)
— Hardware (Multi-Rail Fat Tree, Ethernet RoCE switches)
— Mix (IBGDA, 3FS filesystem)

DeepSeek's engineering depth is insane. Must read.
Nic Fishman (@njwfish) 's Twitter Profile Photo

🚨 New preprint 🚨 We introduce Generative Distribution Embeddings (GDEs) — a framework for learning representations of distributions, not just datapoints. GDEs enable multiscale modeling and come with elegant statistical theory and some miraculous geometric results! 🧵

🚨 New preprint 🚨

We introduce Generative Distribution Embeddings (GDEs) — a framework for learning representations of distributions, not just datapoints.

GDEs enable multiscale modeling and come with elegant statistical theory and some miraculous geometric results!

🧵
Sebastian Aaltonen (@sebaaltonen) 's Twitter Profile Photo

And, we are back in triangles for neural graphics :D First it was just a neural network (NeRF), then sparse octree (NeRF converted to octree), then gaussian splats (basically particle splatting) and now triangles :)

Emmanuel Ameisen (@mlpowered) 's Twitter Profile Photo

The methods we used to trace the thoughts of Claude are now open to the public! Today, we are releasing a library which lets anyone generate graphs which show the internal reasoning steps a model used to arrive at an answer.

The methods we used to trace the thoughts of Claude are now open to the public!

Today, we are releasing a library which lets anyone generate graphs which show the internal reasoning steps a model used to arrive at an answer.
Stella Biderman (@blancheminerva) 's Twitter Profile Photo

Two years in the making, we finally have 8 TB of openly licensed data with document-level metadata for authorship attribution, licensing details, links to original copies, and more. Hugely proud of the entire team.

Luca Ambrogioni (@lucaamb) 's Twitter Profile Photo

1/2) It's finally out on Arxiv: Feedback guidance of generative diffusion models! We derived an adaptive guidance methods from first principles that regulate the amount of guidance based on its current state. Complex prompts are highly guided while simplem ones are almost free

1/2) It's finally out on Arxiv: Feedback guidance of generative diffusion models!

We derived an adaptive guidance methods from first principles that regulate the amount of guidance based on its current state.

Complex prompts are highly guided while simplem ones are almost free
Geoff Langdale (@geofflangdale) 's Twitter Profile Photo

I'm working on a good heuristic to put would-be tech influencers on mute/block (depending on obnoxiousness). Current idea is >10K (5K?) followers without any discernable achievements, or some scaled version of same for people with *some* achievements but who have clearly ...

Stella Biderman (@blancheminerva) 's Twitter Profile Photo

Take the LLaMA 3 paper for another example. I know (from personal experience and talking to others) that many authors of this paper endorse the above view. And yet, not a single model in their scaling laws plots is that large! (7B / 1T = 4.2e22 FLOP)

Take the LLaMA 3 paper for another example. I know (from personal experience and talking to others) that many authors of this paper endorse the above view. And yet, not a single model in their scaling laws plots is that large! (7B / 1T = 4.2e22 FLOP)
Casper Hansen (@casper_hansen_) 's Twitter Profile Photo

Ex-DeepSeek author of Native Sparse Attention won best paper award of ACL. I was lucky enough to attend a live lecture where he revealed: - scaling up context length to 1 million - this will be in the next frontier model There is good reason to believe DeepSeek V4 will use NSA.

Ex-DeepSeek author of Native Sparse Attention won best paper award of ACL.

I was lucky enough to attend a live lecture where he revealed:
- scaling up context length to 1 million
- this will be in the next frontier model

There is good reason to believe DeepSeek V4 will use NSA.