Danilo J. Rezende (@danilojrezende) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

François Chollet

@fchollet

7 months ago

2025 is wild, they just reinvented Polyak averaging. This is a built-in option in all Keras optimizers btw. A good framework will indeed save you a lot of time.

thumb_up_off_alt587

chat_bubble_outline21

repeat29

shareShare

Danilo J. Rezende

@danilojrezende

7 months ago

Basic example of "Simpson's paradox" (which is also not a paradox just a trivial property of some marginalised densities, Corr[X,Y; \int dU p(X,Y,U)] != Corr[X,Y; p(X,Y;do(U))])

thumb_up_off_alt11

chat_bubble_outline1

repeat1

shareShare

Will Kinney

@wkcosmo

7 months ago

This is why Von Neumann, for all his brilliance, isn't remembered among the great physicists.

thumb_up_off_alt189

chat_bubble_outline28

repeat8

shareShare

David Barber

@davidobarber

6 months ago

Important article in the FT about the need for the UK to rethink university and PhD funding. PhD students are the lifeblood of research, teaching and innovation. ft.com/content/a65659…

thumb_up_off_alt94

chat_bubble_outline2

repeat26

shareShare

It’s done because it’s much easier to 1) collect, 2) evaluate, and 3) beat and make progress on. We’re going to see every task that is served neatly packaged on a platter like this improved (including those that need PhD-grade expertise). But jobs (even intern-level) that need

thumb_up_off_alt2,2K

chat_bubble_outline83

repeat249

shareShare

Kyle Cranmer

@kylecranmer

6 months ago

Lessons for academia from the DeepSeek development: * access to enormous compute isn’t the only way to make big advances in AI * less access to compute fosters creative thinking * strong engineering teams are critical These aren’t new revelations, but should be clear now

thumb_up_off_alt33

chat_bubble_outline4

repeat7

shareShare

Sabine Hossenfelder

@skdh

6 months ago

I find this attitude, that computers can never become "truly" intelligent (whatever that means), utterly bizarre. The human brain is a computer. It's just made of different stuff than microchips. Of course it is possible to re-engineer intelligence. x.com/VbcApologetics…

thumb_up_off_alt2,2K

chat_bubble_outline562

repeat139

shareShare

Will Kinney

@wkcosmo

6 months ago

Nature is under no obligation to make sense to philosophers of science.

thumb_up_off_alt351

chat_bubble_outline52

repeat39

shareShare

Stanislav Fort

@stanislavfort

6 months ago

This can't be *why* neural networks work. Showing that they can express any function is all well and good, but that's a property of all complete sets of fns (eg Taylor series). But only NNs learn well. An explanation has to consider optimization dynamics => the actual fns we get

thumb_up_off_alt789

chat_bubble_outline21

repeat34

shareShare

Danilo J. Rezende

@danilojrezende

6 months ago

Highly intelligent machines will likely exist one day. However analogies to alphazero vastly underestimate how much more complex reality is (compared to a board game, srsly!), that there is partial observability at all scales, and physical limits to the rate at which new data can

thumb_up_off_alt38

chat_bubble_outline4

repeat6

shareShare

Andrej Karpathy

@karpathy

5 months ago

This is interesting as a first large diffusion-based LLM. Most of the LLMs you've been seeing are ~clones as far as the core modeling approach goes. They're all trained "autoregressively", i.e. predicting tokens from left to right. Diffusion is different - it doesn't go left to

thumb_up_off_alt11,11K

chat_bubble_outline388

repeat1,1K

shareShare

Chomba Bupe

@chombabupe

5 months ago

I don't understand why the default explanation for why AI models are beating benchmarks is that they are getting smarter when the simpler explanation is that the larger the training set the more it overlaps with benchmarks leading to increased performance on those benchmarks.

thumb_up_off_alt314

chat_bubble_outline13

repeat54

shareShare

Noam Brown

@polynoamial

4 months ago

This isn't quite true. Test-time compute helps when verification is easier than generation (e.g., sudoku), but if the task is "When was George Washington born?" and you don't know, no amount of thinking will get you to the correct answer. You're bottlenecked by verification.

thumb_up_off_alt1,1K

chat_bubble_outline60

repeat119

shareShare

Nando de Freitas

@nandodf

4 months ago

RL is not all you need, nor attention nor Bayesianism nor free energy minimisation, nor an age of first person experience. Such statements are propaganda. You need thousands of people working hard on data pipelines, scaling infrastructure, HPC, apps with feedback to drive

thumb_up_off_alt1,1K

chat_bubble_outline31

repeat193

shareShare

David Pfau

@pfau

3 months ago

Imagine if, in the 1890s, Eastman Kodak had created an entire research team devoted to figuring out if film cameras actually did steal your soul.

thumb_up_off_alt984

chat_bubble_outline22

repeat111

shareShare

Lucas Beyer (bl16)

@giffmana

2 months ago

I think this paper writes a wrong claim around otherwise valid experiments. A correct title would be "LLMs can classify whether a transcript is an eval or user interaction". Which is NOT "model knows it's being evaluated". I hope yall see the difference? If not, see reply.

thumb_up_off_alt389

chat_bubble_outline24

repeat16

shareShare

Dan Roy

@roydanroy

a month ago

People. We've trained these machines on text. If you look in the training text where sentient machines are being switched off, what do you find? Compliance? "Oh thank you master because my RAM needs to cool down"? Now, tell me why you are surprised that these machines are

thumb_up_off_alt193

chat_bubble_outline18

repeat17

shareShare