Jaehoon Lee (@hoonkp) 's Twitter Profile
Jaehoon Lee

@hoonkp

Researcher in machine learning with background in physics; Member of Technical Staff @AnthropicAI; Prev. Research scientist @GoogleDeepMind/@GoogleBrain.

ID: 90276706

linkhttp://jaehlee.github.io calendar_today15-11-2009 23:47:33

242 Tweet

1,1K Followers

662 Following

Ethan Dyer (@ethansdyer) 's Twitter Profile Photo

1/ Super excited to introduce #Minerva 🦉(goo.gle/3yGpTN7). Minerva was trained on math and science found on the web and can solve many multi-step quantitative reasoning problems.

1/ Super excited to introduce #Minerva 🦉(goo.gle/3yGpTN7). Minerva was trained on math and science found on the web and can solve many multi-step quantitative reasoning problems.
Lilian Weng (@lilianweng) 's Twitter Profile Photo

🧮 I finally spent some time learning what exactly Neural Tangent Kernel (NTK) is and went through some mathematical proof. Hopefully after reading this, you will not feel all the math behind NTK is that scaring, but rather, quite intuitive. lilianweng.github.io/posts/2022-09-…

Sam Altman (@sama) 's Twitter Profile Photo

the deadline for applying to the OpenAI residency is tomorrow. if you are an engineer or researcher from any field who wants to start working on AI, please consider applying. many of our best people have come from this program! boards.greenhouse.io/openai/jobs/46… boards.greenhouse.io/openai/jobs/46…

Jaehoon Lee (@hoonkp) 's Twitter Profile Photo

Very interesting paper by James Sully, Dan Roberts and Alex Maloney investigating theoretical origin of neural scaling laws! Happy to read the 97p paper and learn about new tools in RMT and insights of how statistics of natural datasets are translated into power-law scaling.

James Harrison (@jmes_harrison) 's Twitter Profile Photo

Tired of tuning your neural network optimizer? Wish there was an optimizer that just worked? We’re excited to release VeLO 🚲, the first hyperparameter-free learned optimizer that outperforms hand-designed optimizers on real-world problems: velo-code.github.io 🧵

Tired of tuning your neural network optimizer? Wish there was an optimizer that just worked? We’re excited to release VeLO 🚲, the first hyperparameter-free learned optimizer that outperforms hand-designed optimizers on real-world problems: velo-code.github.io 
🧵
Jaehoon Lee (@hoonkp) 's Twitter Profile Photo

Today at 11am CT, Hall J #806 we are presenting our paper on infinite width neural network kernels! We have methods to compute NTK/NNGP for extended set of activations + sketched embeddings for efficient approximation (100x) for compute intensive conv kernels! See you there!

Zi Wang, Ph.D. (@ziwphd) 's Twitter Profile Photo

Jasper Jasper talking about the ongoing journey towards BIG Gaussian processes! A team effort with Jaehoon Lee, Ben Adlam, Shreyas Padhy and Zachary Nado. Join us at NeurIPS GP workshop neurips.cc/virtual/2022/w…

Jasper <a href="/latentjasper/">Jasper</a> talking about the ongoing journey towards BIG Gaussian processes! A team effort with <a href="/hoonkp/">Jaehoon Lee</a>, Ben Adlam, <a href="/shreyaspadhy/">Shreyas Padhy</a> and <a href="/zacharynado/">Zachary Nado</a>. Join us at NeurIPS GP workshop neurips.cc/virtual/2022/w…
Jaehoon Lee (@hoonkp) 's Twitter Profile Photo

This is amazing opportunity to work on impactful problems in Large Language Models with cool people! Highly recommended!

Jaehoon Lee (@hoonkp) 's Twitter Profile Photo

Analyzing training instabilities in Transformers made more accessible by awesome work by Mitchell Wortsman during his internship at Google DeepMind! We encourage you to think more on understanding the fundamental cause and effect of training instabilities as the models scale up!

Noah Constant (@noahconst) 's Twitter Profile Photo

Ever wonder why we don’t train LLMs over highly compressed text? Turns out it’s hard to make it work. Check out our paper for some progress that we’re hoping others can build on. arxiv.org/abs/2404.03626 With Brian Lester, Jaehoon Lee, Alex Alemi, Jeffrey Pennington, Adam Roberts, Jascha Sohl-Dickstein

Brian Lester (@blester125) 's Twitter Profile Photo

Is Kevin onto something? We found that LLMs can struggle to understand compressed text, unless you do some specific tricks. Check out arxiv.org/abs/2404.03626 and help Jaehoon Lee, Alex Alemi, Jeffrey Pennington, Adam Roberts, Jascha Sohl-Dickstein, Noah Constant and I make Kevin’s dream a reality.

Peter J. Liu (@peterjliu) 's Twitter Profile Photo

We recently open-sourced a relatively minimal implementation example of Transformer language model training in JAX, called NanoDO. If you stick to vanilla JAX components, the code is relatively straightforward to read -- the model file is <150 lines. We found it useful as a

Peter J. Liu (@peterjliu) 's Twitter Profile Photo

It was a pleasure working on Gemma 2. The team is relatively small but very capable. Glad to see it get released. On the origin of techniques: 'like Grok', 'like Mistral', etc. is a weird way to describe them as they all originated at Google Brain/DeepMind and the way they ended

Jaehoon Lee (@hoonkp) 's Twitter Profile Photo

Tour de force led by Katie Everett investigating the interplay between neural network parameterization and optimizers; the thread/paper includes lot of gems (theory insight, extensive empirics, and cool new tricks)!

Behnam Neyshabur (@bneyshabur) 's Twitter Profile Photo

Ethan Dyer and I have started a new team at Anthropic — and we’re hiring! Our team is organized around the north star goal of building an AI scientist: a system capable of solving the long-term reasoning challenges and core capabilities needed to push the scientific

Jaehoon Lee (@hoonkp) 's Twitter Profile Photo

Claude 4 models are here 🎉 From research to engineering, safety to product - this launch showcases what's possible when the entire Anthropic team comes together. Honored to be part of this journey! Claude has been transforming my daily workflow, hope it does the same for you!