Jeremy Cohen (@deepcohen) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

So Yeon (Tiffany) Min on Industry Job Market

@soyeontiffmin

7 months ago

I am on the industry job market, and am planning to interview around next March. I am attending NeurIPS Conference, and I hope to meet you there if you are hiring! My website: soyeonm.github.io Short bio about me: I am a 5th year PhD student at CMU MLD, working with Russ Salakhutdinov

thumb_up_off_alt138

chat_bubble_outline0

repeat58

shareShare

Alberto Bietti

@albertobietti

7 months ago

Applications to our Research Fellow position at Flatiron CCM are closing soon on Dec 15! It's a great place for doing fundamental ML research with a lot of freedom in a great environment, in the heart of NYC. Apply here: apply.interfolio.com/155357

thumb_up_off_alt82

chat_bubble_outline0

repeat26

shareShare

Berfin Simsek

@bsimsek13

7 months ago

📢 I'm on the faculty job market this year! My research explores the foundations of deep learning and analyzes learning and feature geometry for Gaussian inputs. I detail my major contributions👇Retweet if you find it interesting and help me spread the word! DM is open. 1/n

thumb_up_off_alt76

chat_bubble_outline1

repeat22

shareShare

Jeremy Cohen

@deepcohen

7 months ago

I’ll be at NeurIPS from Wednesday through Sunday. Would be great to meet with anyone interested in optimization dynamics of deep learning! DMs are open.

thumb_up_off_alt57

chat_bubble_outline1

repeat2

shareShare

Dayal Kalra

@dayal_kalra

7 months ago

I'll be at #NeurIPS2024 this week, presenting our work tomorrow on the mechanisms of warmup! openreview.net/forum?id=NVl4S… 📍 West Ballroom A-D (#5907) 📅 Wed, Dec 11 ⏰ 4:30 PM - 7:30 PM PST Looking forward to engaging discussions!

thumb_up_off_alt22

chat_bubble_outline1

repeat5

shareShare

Bobby

@bobby_he

7 months ago

Come by poster #2402 East hall at NeurIPS from 11am-2pm Friday to chat about why outlier features emerge during training and how we can prevent them!

thumb_up_off_alt45

chat_bubble_outline0

repeat10

shareShare

Ameet Talwalkar

@atalwalkar

6 months ago

I have some news to share! Datadog, Inc. is forming a new AI research lab, and I'm excited to announce that I've joined as Chief Scientist to lead this effort. Datadog has a great work culture, lots of data and compute, and is committed to open science and open sourcing. Our team

I have some news to share!

<a href="/datadoghq/">Datadog, Inc.</a> is forming a new AI research lab, and I'm excited to announce that I've joined as Chief Scientist to lead this effort. Datadog has a great work culture, lots of data and compute, and is committed to open science and open sourcing.

Our team

thumb_up_off_alt288

chat_bubble_outline24

repeat32

shareShare

Pierfrancesco Beneventano

@pierbeneventano

5 months ago

I and Arseniy, I believe, made a step towards properly characterizing how and when the training of Mini-Batch SGD shows Edge of Stability/Break-Even Point (Stanisław Jastrzębski, Jeremy Cohen). Link: arxiv.org/abs/2412.20553

thumb_up_off_alt23

chat_bubble_outline3

repeat7

shareShare

Samuel Sokota

@ssokota

5 months ago

Model-free deep RL algorithms like NFSP, PSRO, ESCHER, & R-NaD are tailor-made for games with hidden information (e.g. poker). We performed the largest-ever comparison of these algorithms. We find that they do not outperform generic policy gradient methods, such as PPO. 1/N

thumb_up_off_alt351

chat_bubble_outline9

repeat59

shareShare

Jacob Springer

@jacspringer

4 months ago

Training with more data = better LLMs, right? 🚨 False! Scaling language models by adding more pre-training data can decrease your performance after post-training! Introducing "catastrophic overtraining." 🥁🧵+arXiv 👇 1/9

thumb_up_off_alt790

chat_bubble_outline16

repeat173

shareShare

Christina Baek

@_christinabaek

3 months ago

Are current reasoning models optimal for test-time scaling? 🌠 No! Models make the same incorrect guess over and over again. We show that you can fix this problem w/o any crazy tricks 💫 – just do weight ensembling (WiSE-FT) for big gains on math! 1/N

thumb_up_off_alt478

chat_bubble_outline6

repeat103

shareShare

Asher Trockman

@ashertrockman

3 months ago

Are you a frontier lab investing untold sums in training? Are you trying to stay competitive? Are you finding that your competitors' models are ... thinking a bit too much like yours? Then antidistillation.com might be for you! Sam Altman Elon Musk

thumb_up_off_alt139

chat_bubble_outline5

repeat29

shareShare

Dayal Kalra

@dayal_kalra

3 months ago

Excited to share our paper "Universal Sharpness Dynamics..." is accepted to #ICLR2025! Neural net training exhibits rich curvature (sharpness) dynamics (sharpness reduction, progressive sharpening, Edge of Stability)- but why?🤔 We show that a minimal model captures it all! 1/n

thumb_up_off_alt469

chat_bubble_outline4

repeat60

shareShare

Jeremy Cohen

@deepcohen

3 months ago

Catch this poster at 3pm-5:30pm today:

thumb_up_off_alt81

chat_bubble_outline1

repeat9

shareShare

Vaishnavh Nagarajan

@_vaishnavh

a month ago

📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue: → LLMs are limited in creativity since they learn to predict the next token → creativity can be improved via multi-token learning & injecting noise ("seed-conditioning" 🌱) 1/ 🧵

thumb_up_off_alt137

chat_bubble_outline1

repeat35

shareShare

Robert M. Gower 🇺🇦

@gowerrobert

a month ago

Are you interested in the new Muon/Scion/Gluon method for training LLMs? To run Muon, you need to approximate the matrix sign (or polar factor) of the momentum matrix. We've developed an optimal method *The PolarExpress* just for this! If you're interested, climb aboard 1/x

thumb_up_off_alt190

chat_bubble_outline2

repeat23

shareShare

Jingfeng Wu

@uuujingfeng

a month ago

1/3 Sharing two new papers on accelerating GD via large stepsizes! Classical GD analysis assumes small stepsizes for stability. However, in practice, GD is often used with large stepsizes, which lead to instability. See my slides for more details: uuujf.github.io/postdoc/wu2025…

thumb_up_off_alt101

chat_bubble_outline2

repeat12

shareShare