Adam Santoro (@santoroai) 's Twitter Profile
Adam Santoro

@santoroai

Research Scientist in artificial intelligence at DeepMind

ID: 733640801914343425

calendar_today20-05-2016 12:49:32

1,1K Tweet

9,9K Followers

227 Following

Adam Santoro (@santoroai) 's Twitter Profile Photo

Have we ever considered the doomsday AI scenario where an alien superintelligent AI is so smart that it figures out how to wormhole its way to our planet and demolishes us all to maximize its paperclip count? Should have happened by now, no?

Adam Santoro (@santoroai) 's Twitter Profile Photo

There's no better sign that ideas have stagnated than when any new piece of evidence is either strongly in your favor, or else must be flawed in some way Coincidentally, this is how conspiracy theories work

Andrew Lampinen (@andrewlampinen) 's Twitter Profile Photo

Abstract reasoning is ideally independent of content. Language models do not achieve this standard, but neither do humans. In a new paper (arxiv.org/abs/2207.07051 co-led by Ishita Dasgupta) we show that LMs in fact mirror classic human patterns of content effects on reasoning. 1/

Adam Santoro (@santoroai) 's Twitter Profile Photo

Interesting focus on architecture in many responses. My list is less about what's new, and more about what proved to be important: 1) Interactive experience 2) Embracing emergence 3) Engaging with real world complexity 4) The consequences of being socially embedded

Nat McAleese (@__nmca__) 's Twitter Profile Photo

I can finally talk about Sparrow! My team used RL to train a 70 billion parameter model to be simultaneously safer and more helpful… (1/5)

I can finally talk about Sparrow! My team used RL to train a 70 billion parameter model to be simultaneously safer and more helpful… (1/5)
Andrew Lampinen (@andrewlampinen) 's Twitter Profile Photo

I'm not a scaling maximalist, but it's surprising to me how many people are 1) interested in differences between human and artificial intelligence and 2) think scaling to improve performance means deep learning is doing something fundamentally wrong. 1/n

Peter Humphreys (@p_humphreys) 's Twitter Profile Photo

Standard transformer-based language models use the same amount of compute for each token. Our new method, which we call Mixture-of-Depths, allows transformers to instead learn to dynamically allocate compute to specific positions in a sequence. arxiv.org/abs/2404.02258

Alex Hägele (@haeggee) 's Twitter Profile Photo

... and with one experiment, I was able to roughly reproduce their results for a ~220M GPT-2. It gives a speedup of ~20min (80min dense vs 60min MoD, 4 A100s) while keeping the pplx close. This roughly matches Fig. 3 or 4 in the paper arxiv.org/pdf/2404.02258…

... and with one experiment, I was able to roughly reproduce their results for a ~220M GPT-2. It gives a speedup of ~20min (80min dense vs 60min MoD, 4 A100s) while keeping the pplx close. This roughly matches Fig. 3 or 4 in the paper arxiv.org/pdf/2404.02258…
George Grigorev (@iamgrigorev) 's Twitter Profile Photo

I have implemented Mixture-of-Depths and it shows significant memory reduction during training and 10% speed increase. I will verify if it achieves the same quality with 12.5% active tokens. github.com/thepowerfuldee… thanks Alex Hägele for initial code

I have implemented Mixture-of-Depths and it shows significant memory reduction during training and 10% speed increase. I will verify if it achieves the same quality with 12.5% active tokens.
github.com/thepowerfuldee…
thanks <a href="/haeggee/">Alex Hägele</a> for initial code
Joey (e/λ) (@shxf0072) 's Twitter Profile Photo

Mixture of depth works for 300M Seq 512 its faster and archives better loss Code: github.com/joey00072/ohar… writeup: huggingface.co/blog/joey00072…

Mixture of depth works 
for 300M Seq 512 its faster and archives better loss
Code: github.com/joey00072/ohar…
writeup: huggingface.co/blog/joey00072…
Michael Chang (@mmmbchang) 's Twitter Profile Photo

Gemini and I also got a chance to watch the OpenAI live announcement of gpt4o, using Project Astra! Congrats to the OpenAI team, super impressive work!

finbarr (@finbarrtimbers) 's Twitter Profile Photo

reading the "mixture of depths" paper, which comes up with a novel way to conditionally apply compute depth-wise in a decoder basically they use standard MoE-style expert-choice routing but they use it to choose which tokens get to go through every block in the decoder

reading the "mixture of depths" paper, which comes up with a novel way to conditionally apply compute depth-wise in a decoder

basically they use standard MoE-style expert-choice routing but they use it to choose which tokens get to go through every block in the decoder