neptune.ai (@neptune_ai) 's Twitter Profile
neptune.ai

@neptune_ai

Experiment tracker purpose-built for foundation model training.

We tweet about #LLM best practices & other cool stuff.
Read our blog at neptune.ai/blog

ID: 948867833433329664

linkhttps://neptune.ai calendar_today04-01-2018 10:44:55

12,12K Tweet

7,7K Followers

874 Following

neptune.ai (@neptune_ai) 's Twitter Profile Photo

You can now normalize your chart series by the first value. Turn it on and every line starts at 0. Helpful when comparing multiple runs with different starting points.

neptune.ai (@neptune_ai) 's Twitter Profile Photo

Our ML-inspired coffee menu is live at ICML 2025. 2 bikes at the VCC main entrance. Pouring up to 800 cups a day—when it’s gone, it’s gone! Will you go for the "MoE-tcha Latte" or "AdamEricano"? Bonus: Score a pocket notebook for those “wait, this might actually work”

Our ML-inspired coffee menu is live at ICML 2025. 

2 bikes at the VCC main entrance. Pouring up to 800 cups a day—when it’s gone, it’s gone! 

Will you go for the "MoE-tcha Latte" or "AdamEricano"?

Bonus: Score a pocket notebook for those “wait, this might actually work”
neptune.ai (@neptune_ai) 's Twitter Profile Photo

Turns out that more teams than you’d expect find debugging I/O bottlenecks one of the top challenges in training foundation models. While modeling issues still exist, many consider them “mostly solved.” 🔍 The State of #FoundationModel Training Report shows → top FM teams

neptune.ai (@neptune_ai) 's Twitter Profile Photo

Every experiment deserves documentation, and so do the moments that make ICML special. Our party frames are making the rounds. What will you take home from the conference? We’re hoping it’s a camera roll full of memories, new connections, and plenty of insights. That’s why this

Every experiment deserves documentation, and so do the moments that make ICML special. Our party frames are making the rounds.

What will you take home from the conference? We’re hoping it’s a camera roll full of memories, new connections, and plenty of insights.

That’s why this
neptune.ai (@neptune_ai) 's Twitter Profile Photo

If you're training large models, you've likely spent too much time trying to debug training dynamics from raw numbers. We have now added support for histograms in neptune.ai. Whether it's exploding gradients or subtle drifts in activation distributions, now you can catch it

neptune.ai (@neptune_ai) 's Twitter Profile Photo

In case you missed it: Dark mode is now available in neptune.ai. It’s clean, easy on the eyes, and just... nice to look at. Give it a spin.

neptune.ai (@neptune_ai) 's Twitter Profile Photo

We brought together 30 brilliant minds from OpenAI , Google DeepMind , InstaDeep , Amazon, Hugging Face, and more. One boat. Discussions on the challenges of SOTA foundation model training, building scalable infra, and ensuring stability in training, among other topics.

We brought together 30 brilliant minds from <a href="/OpenAI/">OpenAI</a> ,  <a href="/GoogleDeepMind/">Google DeepMind</a> , <a href="/instadeepai/">InstaDeep</a> , <a href="/amazon/">Amazon</a>,  <a href="/huggingface/">Hugging Face</a>, and more.

One boat. Discussions on the challenges of SOTA foundation model training, building scalable infra, and ensuring stability in training, among other topics.
neptune.ai (@neptune_ai) 's Twitter Profile Photo

“We spotted a silent weight-decay issue 10 hours earlier just by glancing at the weight-norm histogram.” At 100+ #GPU scale, that’s days of compute saved. And it’s now a built-in view in neptune.ai. Equally snappy and responsive as all the visualizations in our app.

neptune.ai (@neptune_ai) 's Twitter Profile Photo

Infra predictability now outweighs cloud flexibility. Teams in the report cite better control, cost, and stability by building GPU clusters in-house. If your team depends on reliable access and tight tuning, on-prem may be the default. 📘 More State of #FoundationModelTraining

neptune.ai (@neptune_ai) 's Twitter Profile Photo

Small feature, but a very useful one: “Normalize by first value” is now available in charts. Makes it easier to see relative performance across runs, even if their scales differ.

neptune.ai (@neptune_ai) 's Twitter Profile Photo

[New on our blog] How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models by Klea Ziu TL;DR → Vanishing or exploding gradients are common training instabilities observed in foundation models. → Real-time gradient-norm monitoring using experiment trackers

[New on our blog] How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models by Klea Ziu

TL;DR

→ Vanishing or exploding gradients are common training instabilities observed in foundation models.
 
→ Real-time gradient-norm monitoring using experiment trackers
neptune.ai (@neptune_ai) 's Twitter Profile Photo

Histograms reveal what line charts miss. The partial vanish/explode cases that kill #foundationmodel training stability. neptune.ai now lets you log and visualize histograms per layer, per step, across runs. And it doesn’t slow down the app. Play with an example here:

neptune.ai (@neptune_ai) 's Twitter Profile Photo

“This is how I decide which model to kill and which one to keep.” Oliver Lammas, a founding engineer at Navier AI, trains large-scale models (with tens of millions of parameters) on complex, high-fidelity computational fluid dynamics (CFD) datasets. To pick the best runs, he

“This is how I decide which model to kill and which one to keep.”

Oliver Lammas, a founding engineer at <a href="/NavierAI/">Navier AI</a>, trains large-scale models (with tens of millions of parameters) on complex, high-fidelity computational fluid dynamics (CFD) datasets. 

To pick the best runs, he
neptune.ai (@neptune_ai) 's Twitter Profile Photo

We know dark mode won’t change your life. But it might make your 3-hour debugging evening session a little less harsh. Now you can switch from light to dark mode in neptune.ai.

neptune.ai (@neptune_ai) 's Twitter Profile Photo

Self-hosted shouldn’t feel like a second-class citizen. With neptune.ai, it doesn’t. Our self-hosted deployment is built for scale: high-throughput ingestion, fast retrieval, resilient architecture, and no forced path back to SaaS. It runs where you need it to. And stays