Danilo J. Rezende (@danilojrezende) 's Twitter Profile
Danilo J. Rezende

@danilojrezende

Head of AI Research @ EIT | ex-Director @ DeepMind Building models to accelerate fundamental sciences and medicine. Opinions my own.

ID: 797433864

linkhttps://danilorezende.com/ calendar_today02-09-2012 03:44:53

3,3K Tweet

35,35K Followers

1,1K Following

François Chollet (@fchollet) 's Twitter Profile Photo

2025 is wild, they just reinvented Polyak averaging. This is a built-in option in all Keras optimizers btw. A good framework will indeed save you a lot of time.

Danilo J. Rezende (@danilojrezende) 's Twitter Profile Photo

Basic example of "Simpson's paradox" (which is also not a paradox just a trivial property of some marginalised densities, Corr[X,Y; \int dU p(X,Y,U)] != Corr[X,Y; p(X,Y;do(U))])

David Barber (@davidobarber) 's Twitter Profile Photo

Important article in the FT about the need for the UK to rethink university and PhD funding. PhD students are the lifeblood of research, teaching and innovation. ft.com/content/a65659…

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

It’s done because it’s much easier to 1) collect, 2) evaluate, and 3) beat and make progress on. We’re going to see every task that is served neatly packaged on a platter like this improved (including those that need PhD-grade expertise). But jobs (even intern-level) that need

Kyle Cranmer (@kylecranmer) 's Twitter Profile Photo

Lessons for academia from the DeepSeek development: * access to enormous compute isn’t the only way to make big advances in AI * less access to compute fosters creative thinking * strong engineering teams are critical These aren’t new revelations, but should be clear now

Sabine Hossenfelder (@skdh) 's Twitter Profile Photo

I find this attitude, that computers can never become "truly" intelligent (whatever that means), utterly bizarre. The human brain is a computer. It's just made of different stuff than microchips. Of course it is possible to re-engineer intelligence. x.com/VbcApologetics…

Stanislav Fort (@stanislavfort) 's Twitter Profile Photo

This can't be *why* neural networks work. Showing that they can express any function is all well and good, but that's a property of all complete sets of fns (eg Taylor series). But only NNs learn well. An explanation has to consider optimization dynamics => the actual fns we get

Danilo J. Rezende (@danilojrezende) 's Twitter Profile Photo

Highly intelligent machines will likely exist one day. However analogies to alphazero vastly underestimate how much more complex reality is (compared to a board game, srsly!), that there is partial observability at all scales, and physical limits to the rate at which new data can

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

This is interesting as a first large diffusion-based LLM. Most of the LLMs you've been seeing are ~clones as far as the core modeling approach goes. They're all trained "autoregressively", i.e. predicting tokens from left to right. Diffusion is different - it doesn't go left to

Chomba Bupe (@chombabupe) 's Twitter Profile Photo

I don't understand why the default explanation for why AI models are beating benchmarks is that they are getting smarter when the simpler explanation is that the larger the training set the more it overlaps with benchmarks leading to increased performance on those benchmarks.

Noam Brown (@polynoamial) 's Twitter Profile Photo

This isn't quite true. Test-time compute helps when verification is easier than generation (e.g., sudoku), but if the task is "When was George Washington born?" and you don't know, no amount of thinking will get you to the correct answer. You're bottlenecked by verification.

Nando de Freitas (@nandodf) 's Twitter Profile Photo

RL is not all you need, nor attention nor Bayesianism nor free energy minimisation, nor an age of first person experience. Such statements are propaganda. You need thousands of people working hard on data pipelines, scaling infrastructure, HPC, apps with feedback to drive

David Pfau (@pfau) 's Twitter Profile Photo

Imagine if, in the 1890s, Eastman Kodak had created an entire research team devoted to figuring out if film cameras actually did steal your soul.

Lucas Beyer (bl16) (@giffmana) 's Twitter Profile Photo

I think this paper writes a wrong claim around otherwise valid experiments. A correct title would be "LLMs can classify whether a transcript is an eval or user interaction". Which is NOT "model knows it's being evaluated". I hope yall see the difference? If not, see reply.

Dan Roy (@roydanroy) 's Twitter Profile Photo

People. We've trained these machines on text. If you look in the training text where sentient machines are being switched off, what do you find? Compliance? "Oh thank you master because my RAM needs to cool down"? Now, tell me why you are surprised that these machines are