Adam Santoro (@santoroai) Twitter Tweets • TwiCopy

Adam Santoro

@santoroai

4 years ago

The symbol debate in AI is very tiring. AI's treatment of symbols is like a physicist treating cows as spherical points

thumb_up_off_alt11

chat_bubble_outline1

repeat0

shareShare

Have we ever considered the doomsday AI scenario where an alien superintelligent AI is so smart that it figures out how to wormhole its way to our planet and demolishes us all to maximize its paperclip count? Should have happened by now, no?

thumb_up_off_alt3

chat_bubble_outline2

repeat0

shareShare

Adam Santoro

@santoroai

4 years ago

There's no better sign that ideas have stagnated than when any new piece of evidence is either strongly in your favor, or else must be flawed in some way Coincidentally, this is how conspiracy theories work

thumb_up_off_alt0

chat_bubble_outline1

repeat0

shareShare

Andrew Lampinen

@andrewlampinen

3 years ago

Abstract reasoning is ideally independent of content. Language models do not achieve this standard, but neither do humans. In a new paper (arxiv.org/abs/2207.07051 co-led by Ishita Dasgupta) we show that LMs in fact mirror classic human patterns of content effects on reasoning. 1/

thumb_up_off_alt360

chat_bubble_outline6

repeat54

shareShare

Adam Santoro

@santoroai

3 years ago

Wonderful paper. Required reading for those who fling around the word "innate" to motivate research programs in AI.

thumb_up_off_alt14

chat_bubble_outline0

repeat2

shareShare

Adam Santoro

@santoroai

3 years ago

Interesting focus on architecture in many responses. My list is less about what's new, and more about what proved to be important: 1) Interactive experience 2) Embracing emergence 3) Engaging with real world complexity 4) The consequences of being socially embedded

thumb_up_off_alt13

chat_bubble_outline0

repeat2

shareShare

Kory Mathewson

@korymath

3 years ago

Apply to be a Research Scientist Intern at DeepMind Montreal for Summer 2023 grnh.se/87dac2191us

thumb_up_off_alt27

chat_bubble_outline2

repeat3

shareShare

Nat McAleese

@__nmca__

3 years ago

I can finally talk about Sparrow! My team used RL to train a 70 billion parameter model to be simultaneously safer and more helpful… (1/5)

thumb_up_off_alt387

chat_bubble_outline5

repeat39

shareShare

Andrew Lampinen

@andrewlampinen

3 years ago

I'm not a scaling maximalist, but it's surprising to me how many people are 1) interested in differences between human and artificial intelligence and 2) think scaling to improve performance means deep learning is doing something fundamentally wrong. 1/n

thumb_up_off_alt381

chat_bubble_outline13

repeat62

shareShare

Piotr Padlewski

@piotrpadlewski

2 years ago

Big fan of the work of Adam Santoro and others. Glad Google decided to finally release it! arxiv.org/abs/2404.02258

thumb_up_off_alt15

chat_bubble_outline1

repeat4

shareShare

Peter Humphreys

@p_humphreys

2 years ago

Standard transformer-based language models use the same amount of compute for each token. Our new method, which we call Mixture-of-Depths, allows transformers to instead learn to dynamically allocate compute to specific positions in a sequence. arxiv.org/abs/2404.02258

thumb_up_off_alt38

chat_bubble_outline4

repeat11

shareShare

Alex Hägele

@haeggee

2 years ago

... and with one experiment, I was able to roughly reproduce their results for a ~220M GPT-2. It gives a speedup of ~20min (80min dense vs 60min MoD, 4 A100s) while keeping the pplx close. This roughly matches Fig. 3 or 4 in the paper arxiv.org/pdf/2404.02258…

thumb_up_off_alt38

chat_bubble_outline2

repeat7

shareShare

George Grigorev

@iamgrigorev

2 years ago

I have implemented Mixture-of-Depths and it shows significant memory reduction during training and 10% speed increase. I will verify if it achieves the same quality with 12.5% active tokens. github.com/thepowerfuldee… thanks Alex Hägele for initial code

thumb_up_off_alt361

chat_bubble_outline6

repeat51

shareShare

Joey (e/λ)

@shxf0072

2 years ago

Mixture of depth works for 300M Seq 512 its faster and archives better loss Code: github.com/joey00072/ohar… writeup: huggingface.co/blog/joey00072…

thumb_up_off_alt103

chat_bubble_outline3

repeat12

shareShare

Google DeepMind

@googledeepmind

2 years ago

We watched #GoogleIO with Project Astra. 👀

thumb_up_off_alt1,1K

chat_bubble_outline68

repeat230

shareShare

Michael Chang

@mmmbchang

2 years ago

Gemini and I also got a chance to watch the OpenAI live announcement of gpt4o, using Project Astra! Congrats to the OpenAI team, super impressive work!

thumb_up_off_alt1,1K

chat_bubble_outline56

repeat244

shareShare

finbarr

@finbarrtimbers

2 years ago

reading the "mixture of depths" paper, which comes up with a novel way to conditionally apply compute depth-wise in a decoder basically they use standard MoE-style expert-choice routing but they use it to choose which tokens get to go through every block in the decoder

thumb_up_off_alt18

chat_bubble_outline1

repeat2

shareShare