trieu (@thtrieu_) 's Twitter Profile
trieu

@thtrieu_

inventor of #alphageometry. lead of alphageometry 2. thinking about thinking @ deepmind.

ID: 2464807964

linkhttps://github.com/thtrieu calendar_today26-04-2014 15:56:42

1,1K Tweet

2,2K Followers

135 Following

Bryan Johnson (@bryan_johnson) 's Twitter Profile Photo

Here’s another one for you. Blueprint went viral and the second most frequent comment was “I want to do your protocol, but it’s way too complicated, make it easy.” #1 was “I hate you and want you to die” in various forms. Anyways, my team and I did it. Blueprint is the

Here’s another one for you. 

Blueprint went viral and the second most frequent comment was “I want to do your protocol,  but it’s way too complicated, make it easy.” 

#1 was “I hate you and want you to die” in various forms. 

Anyways, my team and I did it. Blueprint is the
wh (@nrehiew_) 's Twitter Profile Photo

With this extremely straightforward setup, the network learns to reflect/reevaluate its own answers. Again, this is done completely without supervision

With this extremely straightforward setup, the network learns to reflect/reevaluate its own answers. Again, this is done completely without supervision
Vaibhav (VB) Srivastav (@reach_vb) 's Twitter Profile Photo

Let’s fucking goo!! DeepSeek R1 1.5B running FULLY LOCALLY in your browser at 60 tok/ sec powered by WebGPU🔥 Intelligence truly is too cheap to meter! ⚡️

Tengyu Ma (@tengyuma) 's Twitter Profile Photo

RL + CoT works great for DeepSeek-R1 & o1, but:  1️⃣ Linear-in-log scaling in train & test-time compute 2️⃣ Likely bounded by difficulty of training problems Meet STP—a self-play algorithm that conjectures & proves indefinitely, scaling better! 🧠⚡🧵🧵 arxiv.org/abs/2502.00212

RL + CoT works great for DeepSeek-R1 & o1, but: 

1️⃣ Linear-in-log scaling in train & test-time compute
2️⃣ Likely bounded by difficulty of training problems

Meet STP—a self-play algorithm that conjectures & proves indefinitely, scaling better! 🧠⚡🧵🧵

arxiv.org/abs/2502.00212
Jacob Austin (@jacobaustin132) 's Twitter Profile Photo

Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n

Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n
AK (@_akhaliq) 's Twitter Profile Photo

This looks wild Google Gemini 2.0 one shot with coder mode prompt: create an autonomous snake game, where 100 snakes compete with each other

Ahmed El-Kishky (@ahelkky) 's Twitter Profile Photo

9/ When we inspected the chain of thought, we discovered the model had independently developed its own test-time strategies. One interesting one was the model 1) wrote a simple brute-force solution first then 2) used it to validate a more complex optimized approach.

9/ When we inspected the chain of thought, we discovered the model had independently developed its own test-time strategies. One interesting one was the model 1) wrote a simple brute-force solution first then 2) 
used it to validate a more complex optimized approach.
Hieu Pham (@hyhieu226) 's Twitter Profile Photo

It's 11:30pm, and many xAI people are in office, hard working at their computers. It's an amazing vibe. Everyone pushes their way to deliver the best experience to you users. Everyone supports everyone. No one fucks no one with politics. You can just do things.

lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

BREAKING: @xAI early version of Grok-3 (codename "chocolate") is now #1 in Arena! 🏆 Grok-3 is: - First-ever model to break 1400 score! - #1 across all categories, a milestone that keeps getting harder to achieve Huge congratulations to @xAI on this milestone! View thread 🧵

BREAKING: @xAI early version of Grok-3 (codename "chocolate") is now #1 in Arena! 🏆

Grok-3 is:
- First-ever model to break 1400 score!
- #1 across all categories, a milestone that keeps getting harder to achieve

Huge congratulations to @xAI on this milestone! View thread 🧵
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

Here you can see xAI Grok-3’s performance across all the top categories: 🔹 Overall w/ Style Control 🔹 Hard Prompts & Hard Prompt w/ Style Control 🔹 Coding 🔹 Math 🔹 Creative Writing 🔹 Instruction Following 🔹 Longer Query 🔹 Multi-Turn

Here you can see <a href="/xai/">xAI</a> Grok-3’s performance across all the top categories:
🔹 Overall w/ Style Control
🔹 Hard Prompts &amp; Hard Prompt w/ Style Control
🔹 Coding
🔹 Math
🔹 Creative Writing
🔹 Instruction Following
🔹 Longer Query
🔹 Multi-Turn
Greg Yang (@thegregyang) 's Twitter Profile Photo

been grinding nonstop to make grok great again may i say we did it? just now catching a breather but no time for celebration (and little time for tweets) we have more coming for yall! ❤️

Zoomer Alcibiades (@hellenicvibes) 's Twitter Profile Photo

Google’s AI agent independently discovered: - A new leukemia drug that then successfully tested in vitro at clinical concentrations - Novel liver fibrosis drug targets - Bacterial cell-level antibiotic mechanisms Looks like the “novel scientific discovery” line has been passed!

Noam Brown (@polynoamial) 's Twitter Profile Photo

We did not “solve math”. For example, our models are still not great at writing proofs. o3 and o4-mini are nowhere close to getting International Mathematics Olympiad gold medals.

hardmaru (@hardmaru) 's Twitter Profile Photo

Reinforcement Learning Teachers of Test Time Scaling In this new paper, we introduce a new way to teach LLMs how to reason by learning to teach, not solve! The core idea: A teacher model is trained via RL to generate explanations from question-answer pairs, optimized to improve

Reinforcement Learning Teachers of Test Time Scaling

In this new paper, we introduce a new way to teach LLMs how to reason by learning to teach, not solve!

The core idea: A teacher model is trained via RL to generate explanations from question-answer pairs, optimized to improve
Mikhail Samin (@mihonarium) 's Twitter Profile Photo

🚨 According to a friend, the IMO asked AI companies not to steal the spotlight from kids and to wait a week after the closing ceremony to announce results. OpenAI announced the results BEFORE the closing ceremony. According to a Coordinator on Problem 6, the one problem OpenAI

🚨 According to a friend, the IMO asked AI companies not to steal the spotlight from kids and to wait a week after the closing ceremony to announce results. OpenAI announced the results BEFORE the closing ceremony.

According to a Coordinator on Problem 6, the one problem OpenAI