trieu (@thtrieu_) Twitter Tweets • TwiCopy

Bryan Johnson

@bryan_johnson

a year ago

On Netflix Jan 1 2025 is the year of don’t die

On <a href="/netflix/">Netflix</a> Jan 1

2025 is the year of don’t die

thumb_up_off_alt24,24K

chat_bubble_outline1,1K

repeat1,1K

shareShare

Here’s another one for you. Blueprint went viral and the second most frequent comment was “I want to do your protocol, but it’s way too complicated, make it easy.” #1 was “I hate you and want you to die” in various forms. Anyways, my team and I did it. Blueprint is the

thumb_up_off_alt4,4K

chat_bubble_outline346

repeat248

shareShare

wh

@nrehiew_

10 months ago

With this extremely straightforward setup, the network learns to reflect/reevaluate its own answers. Again, this is done completely without supervision

thumb_up_off_alt73

chat_bubble_outline2

repeat4

shareShare

Vaibhav (VB) Srivastav

@reach_vb

10 months ago

Let’s fucking goo!! DeepSeek R1 1.5B running FULLY LOCALLY in your browser at 60 tok/ sec powered by WebGPU🔥 Intelligence truly is too cheap to meter! ⚡️

thumb_up_off_alt6,6K

chat_bubble_outline129

repeat722

shareShare

Tengyu Ma

@tengyuma

9 months ago

RL + CoT works great for DeepSeek-R1 & o1, but: 1️⃣ Linear-in-log scaling in train & test-time compute 2️⃣ Likely bounded by difficulty of training problems Meet STP—a self-play algorithm that conjectures & proves indefinitely, scaling better! 🧠⚡🧵🧵 arxiv.org/abs/2502.00212

thumb_up_off_alt556

chat_bubble_outline17

repeat108

shareShare

Jacob Austin

@jacobaustin132

9 months ago

Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n

thumb_up_off_alt1,1K

chat_bubble_outline25

repeat377

shareShare

AK

@_akhaliq

9 months ago

This looks wild Google Gemini 2.0 one shot with coder mode prompt: create an autonomous snake game, where 100 snakes compete with each other

thumb_up_off_alt315

chat_bubble_outline13

repeat41

shareShare

AK

@_akhaliq

9 months ago

discuss: huggingface.co/papers/2502.03…

thumb_up_off_alt24

chat_bubble_outline0

repeat2

shareShare

Barrett

@sledgedev

9 months ago

as long as I make the ai write tests for the code it generated and they pass I should be good.

thumb_up_off_alt4,4K

chat_bubble_outline84

repeat418

shareShare

Ahmed El-Kishky

@ahelkky

9 months ago

9/ When we inspected the chain of thought, we discovered the model had independently developed its own test-time strategies. One interesting one was the model 1) wrote a simple brute-force solution first then 2) used it to validate a more complex optimized approach.

thumb_up_off_alt172

chat_bubble_outline7

repeat13

shareShare

Hieu Pham

@hyhieu226

9 months ago

It's 11:30pm, and many xAI people are in office, hard working at their computers. It's an amazing vibe. Everyone pushes their way to deliver the best experience to you users. Everyone supports everyone. No one fucks no one with politics. You can just do things.

thumb_up_off_alt3,3K

chat_bubble_outline167

repeat121

shareShare

xAI

@xai

9 months ago

x.com/i/broadcasts/1…

thumb_up_off_alt16,16K

chat_bubble_outline2,2K

repeat4,4K

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

9 months ago

BREAKING: @xAI early version of Grok-3 (codename "chocolate") is now #1 in Arena! 🏆 Grok-3 is: - First-ever model to break 1400 score! - #1 across all categories, a milestone that keeps getting harder to achieve Huge congratulations to @xAI on this milestone! View thread 🧵

thumb_up_off_alt7,7K

chat_bubble_outline644

repeat1,1K

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

9 months ago

Here you can see xAI Grok-3’s performance across all the top categories: 🔹 Overall w/ Style Control 🔹 Hard Prompts & Hard Prompt w/ Style Control 🔹 Coding 🔹 Math 🔹 Creative Writing 🔹 Instruction Following 🔹 Longer Query 🔹 Multi-Turn

Here you can see <a href="/xai/">xAI</a> Grok-3’s performance across all the top categories:
🔹 Overall w/ Style Control
🔹 Hard Prompts & Hard Prompt w/ Style Control
🔹 Coding
🔹 Math
🔹 Creative Writing
🔹 Instruction Following
🔹 Longer Query
🔹 Multi-Turn

thumb_up_off_alt564

chat_bubble_outline64

repeat73

shareShare

Greg Yang

@thegregyang

9 months ago

been grinding nonstop to make grok great again may i say we did it? just now catching a breather but no time for celebration (and little time for tweets) we have more coming for yall! ❤️

thumb_up_off_alt5,5K

chat_bubble_outline403

repeat152

shareShare

Zoomer Alcibiades

@hellenicvibes

9 months ago

Google’s AI agent independently discovered: - A new leukemia drug that then successfully tested in vitro at clinical concentrations - Novel liver fibrosis drug targets - Bacterial cell-level antibiotic mechanisms Looks like the “novel scientific discovery” line has been passed!

thumb_up_off_alt4,4K

chat_bubble_outline57

repeat463

shareShare

Shane Gu

@shaneguml

9 months ago

Clearly xAI was designed to scale LLM x RL on thinking models from day 1: AlphaStar x STaR

thumb_up_off_alt127

chat_bubble_outline7

repeat6

shareShare

Noam Brown

@polynoamial

7 months ago

We did not “solve math”. For example, our models are still not great at writing proofs. o3 and o4-mini are nowhere close to getting International Mathematics Olympiad gold medals.

thumb_up_off_alt3,3K

chat_bubble_outline97

repeat168

shareShare

hardmaru

@hardmaru

5 months ago

Reinforcement Learning Teachers of Test Time Scaling In this new paper, we introduce a new way to teach LLMs how to reason by learning to teach, not solve! The core idea: A teacher model is trained via RL to generate explanations from question-answer pairs, optimized to improve

thumb_up_off_alt667

chat_bubble_outline20

repeat100

shareShare

Mikhail Samin

@mihonarium

4 months ago

🚨 According to a friend, the IMO asked AI companies not to steal the spotlight from kids and to wait a week after the closing ceremony to announce results. OpenAI announced the results BEFORE the closing ceremony. According to a Coordinator on Problem 6, the one problem OpenAI

thumb_up_off_alt1,1K

chat_bubble_outline29

repeat115

shareShare