Alex Alemi (@alemi) Twitter Tweets • TwiCopy

Polina Kirichenko

4 years ago

While most papers on knowledge distillation focus on student accuracy, we investigate the agreement between teacher and student networks. Turns out, it is very challenging to match the teacher (even on train data!), despite the student having enough capacity and lots of data.

thumb_up_off_alt114

chat_bubble_outline3

repeat15

shareShare

Venkat Viswanathan

@venkvis

4 years ago

Excited to kick-start focus #SciML series on #ML meets Info theory and statistical mechanics! Amazing speaker/session chair line-up: Alex Alemi (Max Welling), @pratikac (Karthik), Sho Yaida (Jascha Sohl-Dickstein), Yasaman Bahri (Surya Ganguli) and Elena Agliari. Details at: cmu.edu/aced/sciML.html

thumb_up_off_alt196

chat_bubble_outline4

repeat33

shareShare

Samuel Stanton

@samuel_stanton_

4 years ago

We are presenting our paper "Does Knowledge Distillation Really Work?" at #NeurIPS2021 poster session 2 today - come check it out! Joint work with Pavel Izmailov, Polina Kirichenko, Alex Alemi, and Andrew Gordon Wilson. Poster: nips.cc/virtual/2021/p… Paper: arxiv.org/abs/2106.05945

We are presenting our paper "Does Knowledge Distillation Really Work?" at #NeurIPS2021 poster session 2 today - come check it out! Joint work with
<a href="/Pavel_Izmailov/">Pavel Izmailov</a>, <a href="/polkirichenko/">Polina Kirichenko</a>, <a href="/alemi/">Alex Alemi</a>, and
<a href="/andrewgwils/">Andrew Gordon Wilson</a>.

Poster: nips.cc/virtual/2021/p…
Paper: arxiv.org/abs/2106.05945

thumb_up_off_alt79

chat_bubble_outline2

repeat13

shareShare

Ravid Shwartz Ziv

@ziv_ravid

4 years ago

A pretty cool paper (and I also hope useful) on using pre-training models to create highly informative priors for downstream tasks. Thanks to all the collaborators, it was a lot of fun!

thumb_up_off_alt79

chat_bubble_outline2

repeat12

shareShare

Chitwan Saharia

@chitwan_saharia

4 years ago

We are thrilled to announce Imagen, a text-to-image model with unprecedented photorealism and deep language understanding. Explore imagen.research.google and Imagen! A large rusted ship stuck in a frozen lake. Snowy mountains and beautiful sunset in the background. #imagen

thumb_up_off_alt1,1K

chat_bubble_outline57

repeat298

shareShare

Ethan Dyer

@ethansdyer

3 years ago

1/ Super excited to introduce #Minerva 🦉(goo.gle/3yGpTN7). Minerva was trained on math and science found on the web and can solve many multi-step quantitative reasoning problems.

thumb_up_off_alt2,2K

chat_bubble_outline29

repeat526

shareShare

Durk Kingma

@dpkingma

3 years ago

Want to understand and/or play with variational diffusion models? - See colab.research.google.com/github/google-… for a simple stand-alone implementation and explanation. (Thanks Alex Alemi and Ben Poole for making this)! - See colab.research.google.com/github/google-… for an even more basic implementation on 2D data.

thumb_up_off_alt331

chat_bubble_outline1

repeat63

shareShare

Alex Alemi

@alemi

3 years ago

Durk Kingma Ben Poole To accompany the colab, I've also written a blog post blog.alexalemi.com/diffusion.html attempting to make sense of the VDM Diffusion loss. In it, I try to motivate how the VDM diffusion loss is simply the joint KL between the forward and reverse process.

thumb_up_off_alt51

chat_bubble_outline2

repeat11

shareShare

Ben Poole

@poolio

3 years ago

Happy to announce DreamFusion, our new method for Text-to-3D! dreamfusion3d.github.io We optimize a NeRF from scratch using a pretrained text-to-image diffusion model. No 3D data needed! Joint work w/ the incredible team of Ben Mildenhall Ajay Jain Jon Barron #dreamfusion

thumb_up_off_alt5,5K

chat_bubble_outline129

repeat1,1K

shareShare

Alex Alemi

@alemi

3 years ago

PaLM 540 Billion, Google's large language model used 4.2 moles of flops to train. 4.2 Moles!

thumb_up_off_alt9

chat_bubble_outline0

repeat0

shareShare

Alex Alemi

@alemi

2 years ago

Each delivery service should use its own distinctive knock.

thumb_up_off_alt2

chat_bubble_outline1

repeat0

shareShare

Noah Constant

@noahconst

2 years ago

Ever wonder why we don’t train LLMs over highly compressed text? Turns out it’s hard to make it work. Check out our paper for some progress that we’re hoping others can build on. arxiv.org/abs/2404.03626 With Brian Lester, Jaehoon Lee, Alex Alemi, Jeffrey Pennington, Adam Roberts, Jascha Sohl-Dickstein

thumb_up_off_alt76

chat_bubble_outline2

repeat10

shareShare

Brian Lester

@blester125

2 years ago

Is Kevin onto something? We found that LLMs can struggle to understand compressed text, unless you do some specific tricks. Check out arxiv.org/abs/2404.03626 and help Jaehoon Lee, Alex Alemi, Jeffrey Pennington, Adam Roberts, Jascha Sohl-Dickstein, Noah Constant and I make Kevin’s dream a reality.

thumb_up_off_alt15

chat_bubble_outline0

repeat6

shareShare

Alex Alemi

@alemi

2 years ago

In which I try to make sense of most of machine learning: blog.alexalemi.com/kl-is-all-you-…

thumb_up_off_alt296

chat_bubble_outline5

repeat41

shareShare

Alex Alemi

@alemi

a year ago

Why don't we measure probabilities in degrees? blog.alexalemi.com/a-degree-of-ce…

thumb_up_off_alt54

chat_bubble_outline4

repeat11

shareShare

Alex Alemi

@alemi

a year ago

If you miss the NYTimes needle, especially one that is statistically uniform (blog.alexalemi.com/a-degree-of-ce…), you can use this page: alexalemi.com/random/electio… I whipped together to reason about the correlations between the swing states tonight as results come in.

thumb_up_off_alt17

chat_bubble_outline0

repeat1

shareShare

Pavel Izmailov

@pavel_izmailov

a year ago

I am recruiting Ph.D. students for my new lab at New York University! Please apply, if you want to work with me on reasoning, reinforcement learning, understanding generalization and AI for science. Details on my website: izmailovpavel.github.io. Please spread the word!

I am recruiting Ph.D. students for my new lab at <a href="/nyuniversity/">New York University</a>! Please apply, if you want to work with me on reasoning, reinforcement learning, understanding generalization and AI for science.

Details on my website: izmailovpavel.github.io. Please spread the word!

thumb_up_off_alt744

chat_bubble_outline14

repeat101

shareShare

Alex Alemi

@alemi

9 months ago

Recently I've been playing around with a quarter-order-of-magnitude system for simple calculations. It gives better precision than single sig-fig calculations using only four, very intuitive, symbols. blog.alexalemi.com/quarters.html

thumb_up_off_alt7

chat_bubble_outline0

repeat0

shareShare