Shane Bergsma (@shanebergsma) Twitter Tweets • TwiCopy

Shane Bergsma

@shanebergsma

+ Follow

Man bites data

ID: 482109376

linkhttps://sites.google.com/site/shaneabergsma/ calendar_today03-02-2012 14:53:26

206 Tweet

269 Followers

410 Following

Shane Bergsma

@shanebergsma

7 years ago

OMG, now food trucks are even part of the A.I. bandwagon!

thumb_up_off_alt2

chat_bubble_outline1

repeat0

shareShare

Shane Bergsma

@shanebergsma

7 years ago

It's never a bad idea to check, haveibeenpwned.com

thumb_up_off_alt1

chat_bubble_outline1

repeat0

shareShare

Cerebras has set a new record for AI inference speed, serving Llama 3.1 8B at 1,850 output tokens/s and 70B at 446 output tokens/s. @CerebrasSystems has just launched their API inference offering, powered by their custom wafer-scale AI accelerator chips. Cerebras Inference is

thumb_up_off_alt308

chat_bubble_outline12

repeat67

shareShare

Cerebras

@cerebrassystems

7 months ago

It’s #ICLR2025 week, and we’re proud to share that Team Cerebras will be presenting their paper: "Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs" at ICLR 2026! Big congrats to the authors, your work is powering the future of AI compute.

thumb_up_off_alt32

chat_bubble_outline2

repeat5

shareShare

Nolan Dey

@deynolan

6 months ago

(1/7) Cerebras Paper drop: arxiv.org/abs/2505.01618 TLDR: We introduce CompleteP, which offers depth-wise hyperparameter (HP) transfer (Left), FLOP savings when training deep models (Middle), and a larger range of compute-efficient width/depth ratios (Right). 🧵 👇

(1/7) <a href="/CerebrasSystems/">Cerebras</a> Paper drop: arxiv.org/abs/2505.01618

TLDR: We introduce CompleteP, which offers depth-wise hyperparameter (HP) transfer (Left), FLOP savings when training deep models (Middle), and a larger range of compute-efficient width/depth ratios (Right). 🧵 👇

thumb_up_off_alt407

chat_bubble_outline12

repeat68

shareShare

Daria Soboleva

@dmsobol

6 months ago

Major finding #1: λ=0.1 used in the majority of LLMs is suboptimal! Our work shows that optimal weight decay (λ) scales linearly with batch size. Most researchers use the same λ regardless of batch size, leaving performance on the table.

thumb_up_off_alt5

chat_bubble_outline1

repeat1

shareShare

Shikai Qiu

@shikaiqiu

2 months ago

Beautiful work on pretraining science using scaling collapse to precisely predict, debug, and tune LLM training from small-scale and partial runs. So much insights on going beyond μP!

thumb_up_off_alt12

chat_bubble_outline2

repeat2

shareShare

Atli Kosson

@atlikosson

a month ago

The Maximal Update Parameterization (µP) allows LR transfer from small to large models, saving costly tuning. But why is independent weight decay (IWD) essential for it to work? We find µP stabilizes early training (like an LR warmup), but IWD takes over in the long term! 🧵

thumb_up_off_alt289

chat_bubble_outline11

repeat41

shareShare

Shane Bergsma

@shanebergsma

9 years ago

Wikipedia (one of the supreme achievements of humanity) doesn't get enough love, so just let me say, "thank you, Wikipedia."

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Shane Bergsma

@shanebergsma

8 years ago

The whole group? Wow, this migration of academics to industry is getting out of control. x.com/ewanhklein/sta…

thumb_up_off_alt26

chat_bubble_outline0

repeat9

shareShare

@vyedin.bsky.social

@vyedin

8 years ago

In an effort to foster a more cooperative spirit between different parts of my code, I no longer pass *arguments* to a function. Instead when one function calls another, it passes along some *gentle feedback*.

thumb_up_off_alt1,1K

chat_bubble_outline39

repeat515

shareShare

Shane Bergsma

@shanebergsma

8 years ago

My son, after reading half the books: "J.R.R. Tolkien is a man? I had no idea." Than you, J.K. Rowling

thumb_up_off_alt8

chat_bubble_outline0

repeat1

shareShare

Shane Bergsma

Shane Bergsma

Shane Bergsma

Artificial Analysis

Cerebras

Nolan Dey

Daria Soboleva

Shikai Qiu

Atli Kosson

Shane Bergsma

Shane Bergsma

@vyedin.bsky.social

Shane Bergsma