Sean McLeish (@seanmcleish) Twitter Tweets • TwiCopy

𝚐𝔪𝟾𝚡𝚡𝟾

7 months ago

Gemstones: A Model Suite for Multi-Faceted Scaling Laws paper: arxiv.org/abs/2502.06857 code: github.com/mcleish7/gemst… The project analyzes scaling laws across 22 AI models, ranging from 50M to 2B parameters, by varying model width and depth.

thumb_up_off_alt8

chat_bubble_outline0

repeat1

shareShare

Tom Goldstein

@tomgoldsteincs

7 months ago

Our new models for studying scaling laws are out! The Gemstones are 4K checkpoints (22 models) trained on 10T token combined, with varying architectures and learning rates. Here’s my fav new scaling experiment. It explains why industry has abandoned big dense models 🧵 (1/4)

thumb_up_off_alt183

chat_bubble_outline3

repeat30

shareShare

AGI.Eth

@ceobillionaire

7 months ago

Gemstones: A Model Suite for Multi-Faceted Scaling Laws McLeish et al.: arxiv.org/abs/2502.06857 #ArtificialIntelligence #DeepLearning #MachineLearning

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

DAIR.AI

@dair_ai

7 months ago

Here are the top AI Papers of the Week (Feb 10-16): - Latent Reasoning - Large Memory Models - Brain-to-Text Decoding - Enhancing Reasoning to Adapt LLMs - Reinforcement Learning via Self-Play - Competitive Programming with Large Reasoning Models Read on for more:

thumb_up_off_alt804

chat_bubble_outline9

repeat122

shareShare

Sean McLeish

@seanmcleish

7 months ago

If there is another McLeish/Schwarzschild duo doing recurrent deep learning out there, Avi Schwarzschild and I would love to meet you. Otherwise how are we dealing with the overload of ChatGPT made up references these days?! 🤯

If there is another McLeish/Schwarzschild duo doing recurrent deep learning out there, <a href="/A_v_i__S/">Avi Schwarzschild</a> and I would love to meet you. Otherwise how are we dealing with the overload of ChatGPT made up references these days?! 🤯

thumb_up_off_alt22

chat_bubble_outline2

repeat0

shareShare

Ashwinee Panda

@pandaashwinee

6 months ago

people are talking about whether scaling laws are broken or pretraining is saturating. so what does that even mean? consider the loss curves from our recent gemstones paper. as we add larger models, the convex hull doesn’t flatten out on this log-log plot. that's good!

thumb_up_off_alt297

chat_bubble_outline4

repeat14

shareShare

Alexander Doria

@dorialexander

6 months ago

Multiple interesting experiments and findings for pretraining recipes. I especially liked the part about width/depth (not trivial to choose when you’re in the 1-3b range).

thumb_up_off_alt12

chat_bubble_outline1

repeat3

shareShare

The TWIML AI Podcast

@twimlai

6 months ago

Today, we're joined by Jonas Geiping, research group leader at ELLIS Institute Tübingen and the Intelligent Systems to discuss his recent paper, “Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach.” This paper proposes a novel language model architecture which uses

thumb_up_off_alt18

chat_bubble_outline1

repeat6

shareShare

Lawrence Livermore National Laboratory

@livermore_lab

5 months ago

#LLNL is not just advancing #AI, we are redefining how AI and science converge to unlock the next era of discovery. Huginn represents a new breed of language models that emphasizes careful introspection over immediate yet often incomplete answers. livermorelab.info/4ianNIM

thumb_up_off_alt39

chat_bubble_outline0

repeat14

shareShare

Tom Goldstein

@tomgoldsteincs

5 months ago

Great article by Anil Ananthaswamy on latent reasoning systems coming out of AI at Meta, UMD Department of Computer Science , and ELLIS Institute Tübingen. quantamagazine.org/to-make-langua…

thumb_up_off_alt45

chat_bubble_outline2

repeat11

shareShare

Dayal Kalra

@dayal_kalra

5 months ago

Excited to share our paper "Universal Sharpness Dynamics..." is accepted to #ICLR2025! Neural net training exhibits rich curvature (sharpness) dynamics (sharpness reduction, progressive sharpening, Edge of Stability)- but why?🤔 We show that a minimal model captures it all! 1/n

thumb_up_off_alt469

chat_bubble_outline4

repeat60

shareShare

John Kirchenbauer

@jwkirchenbauer

4 months ago

Before you leave Singapore be sure to check out the Tomlab's trio of pretraining papers at the Open Science for Foundation Models (SCI-FM) workshop in Hall 4 #5 ! Jonas and I will be around the rest of the afternoon to share our amd war stories 🥲

thumb_up_off_alt22

chat_bubble_outline0

repeat3

shareShare

Kimon Fountoulakis

@kfountou

4 months ago

Update: 14 empirical papers added. 1. Learning to Execute. Wojciech Zaremba, Ilya Sutskever 2. Neural Programmer-Interpreters.Scott Reed, Nando de Freitas 3. Neural Programmer: Inducing Latent Programs with Gradient Descent. Arvind Neelakantan, Quoc V. Le, Ilya Sutskever 4.

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Alexander Panfilov

@kotekjedi_ml

3 months ago

Stronger models need stronger attackers! 🤖⚔️ In our new paper we explore how attacker-target capability dynamics affect red-teaming success (ASR). Key insights: 🔸Stronger models = better attackers 🔸ASR depends on capability gap 🔸Psychology >> STEM for ASR More in 🧵👇

thumb_up_off_alt61

chat_bubble_outline5

repeat10

shareShare

Ruchit Rawal

@rawalruchit

3 months ago

Introducing ARGUS 👁️ A benchmark for measuring hallucinations and omissions in free-form captions generated by Video-LLMs.

thumb_up_off_alt22

chat_bubble_outline1

repeat6

shareShare

Avi Schwarzschild

@a_v_i__s

3 months ago

Ever tried to tell if someone really forgot your birthday? ... evaluating forgetting is tricky. Now imagine doing that… but for an LLM… with privacy on the line. We studied how to evaluate machine unlearning, and we found some problems. 🧵

thumb_up_off_alt29

chat_bubble_outline1

repeat8

shareShare

Avi Schwarzschild

@a_v_i__s

3 months ago

Big news! 🎉 I’m joining UNC-Chapel Hill as an Assistant Professor in Computer Science starting next year! Before that, I’ll be spending time OpenAI working on LLM privacy. UNC Computer Science UNC NLP

Big news! 🎉 I’m joining UNC-Chapel Hill as an Assistant Professor in Computer Science starting next year! Before that, I’ll be spending time <a href="/OpenAI/">OpenAI</a> working on LLM privacy.
<a href="/unccs/">UNC Computer Science</a> <a href="/uncnlp/">UNC NLP</a>

thumb_up_off_alt573

chat_bubble_outline46

repeat34

shareShare