Xinyang (Young) Geng (@younggeng) Twitter Tweets • TwiCopy

Lechao Xiao

a year ago

1/5. Excited to share a spicy paper, "Rethinking conventional wisdom in machine learning: from generalization to scaling", arxiv.org/pdf/2409.15156. You might love it or dislike it! NotebookLM: notebooklm.google.com/notebook/43f11… While double-descent (generalization-centric,

thumb_up_off_alt119

chat_bubble_outline2

repeat32

shareShare

Charlie Snell

@sea_snell

a year ago

Good post-training data is precious and scarce; compute is less so. We should focus on methods which squeeze more out of existing data by spending additional compute per datapoint, rather than optimizing for cheaper post-training methods

thumb_up_off_alt155

chat_bubble_outline10

repeat7

shareShare

Ayaka

@ayaka14732

a year ago

We finally have an official `nvidia-smi` for TPU 🎉 Simply install it with `pip install tpu-info`

thumb_up_off_alt873

chat_bubble_outline14

repeat102

shareShare

Cristian Garcia

@cgarciae88

a year ago

People learning JAX, feel free to reach out if the learning feels too steep, hopefully we can flatten it out. Also, checkout the JAX LLM for help from the community: discord.gg/m9NDrmENe2

thumb_up_off_alt284

chat_bubble_outline7

repeat20

shareShare

Charlie Snell

@sea_snell

a year ago

Can we predict emergent capabilities in GPT-N+1🌌 using only GPT-N model checkpoints, which have random performance on the task? We propose a method for doing exactly this in our paper “Predicting Emergent Capabilities by Finetuning”🧵

thumb_up_off_alt570

chat_bubble_outline12

repeat70

shareShare

Jerry Tworek

@millionint

a year ago

People completely misunderstand the data wall. It's the data slop wall. Most of the data is so bad it's a waste of a good gpu to backprop it.

thumb_up_off_alt233

chat_bubble_outline13

repeat9

shareShare

Xinyang (Young) Geng

@younggeng

a year ago

Very excited to share the thinking model! It has been a lot of fun working on this.

thumb_up_off_alt30

chat_bubble_outline1

repeat0

shareShare

Jack Rae

@jack_w_rae

a year ago

Appreciate Aidan McLaughlin looking into the thinking model results. Originally scores looked weak as the response was plucked from the thought content versus output. We are looking into ways of making thinking output less confusing for people running evals. This is why we 🚢, to

thumb_up_off_alt105

chat_bubble_outline5

repeat9

shareShare

Jerry Tworek

@millionint

a year ago

Simplify. Scale. Resolve bottlenecks. Repeat.

thumb_up_off_alt134

chat_bubble_outline5

repeat9

shareShare

Andrej Karpathy

@karpathy

a year ago

It’s done because it’s much easier to 1) collect, 2) evaluate, and 3) beat and make progress on. We’re going to see every task that is served neatly packaged on a platter like this improved (including those that need PhD-grade expertise). But jobs (even intern-level) that need

thumb_up_off_alt2,2K

chat_bubble_outline83

repeat249

shareShare

Jim Fan

@drjimfan

a year ago

Whether you like it or not, the future of AI will not be canned genies controlled by a "safety panel". The future of AI is democratization. Every internet rando will run not just o1, but o8, o9 on their toaster laptop. It's the tide of history that we should surf on, not swim

thumb_up_off_alt3,3K

chat_bubble_outline218

repeat655

shareShare

Andrej Karpathy

@karpathy

a year ago

For friends of open source: imo the highest leverage thing you can do is help construct a high diversity of RL environments that help elicit LLM cognitive strategies. To build a gym of sorts. This is a highly parallelizable task, which favors a large community of collaborators.

thumb_up_off_alt8,8K

chat_bubble_outline326

repeat827

shareShare

Hieu Pham

@hyhieu226

a year ago

Despite many complaints about Jax being hard to use, it has a crucial advantage over PyTorch: for distributed jobs, XLA is sufficiently good at auto-scheduling parallelism strategies, e.g., sharding, overlapping compute and comms. If PyTorch becomes good at that, it's checkmate.

thumb_up_off_alt171

chat_bubble_outline9

repeat12

shareShare

Jacob Austin

@jacobaustin132

10 months ago

Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n

thumb_up_off_alt1,1K

chat_bubble_outline25

repeat377

shareShare

rdyro

@rdyro128523

9 months ago

Deepseek R1 inference in pure JAX! Currently on TPU, with GPU and distilled models in-progress. Features MLA-style attention, expert/tensor parallelism & int8 quantization. Contributions welcome!

thumb_up_off_alt295

chat_bubble_outline10

repeat46

shareShare

Jack Rae

@jack_w_rae

9 months ago

Today we are launching 2.5 Pro! I think it's the best model in the world. State-of-the-art reasoning and great vibes (+39 ELO gap on lmsys!) 2.5 Pro improves in coding, stem, multimodal, instruction following, and lots more. Available in AI Studio & the Gemini App!

thumb_up_off_alt474

chat_bubble_outline7

repeat37

shareShare

Tianhe Yu

@tianheyu

9 months ago

The team has built such a smart model. The gap becomes bigger on harder problems!

thumb_up_off_alt90

chat_bubble_outline4

repeat4

shareShare

rdyro

@rdyro128523

8 months ago

Llama 4 inference in pure JAX! Expert/tensor parallelism with int8 quantization. Contributions welcome!

thumb_up_off_alt131

chat_bubble_outline2

repeat14

shareShare

Jack Rae

@jack_w_rae

8 months ago

2.5 Flash is out! You can now specify thinking budgets, or disable thinking entirely for lower latency. Strong code & reasoning capabilities, cost effective, fast. It's a great workhorse model for enterprise and developers, excited to hear your feedback.

thumb_up_off_alt206

chat_bubble_outline4

repeat11

shareShare

Jacob Austin

@jacobaustin132

4 months ago

Today we're putting out an update to the JAX TPU book, this time on GPUs. How do GPUs work, especially compared to TPUs? How are they networked? And how does this affect LLM training? 1/n

thumb_up_off_alt3,3K

chat_bubble_outline36

repeat516

shareShare