Carlo (@carlobaronio) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Inception Labs

@inceptionailabs

5 months ago

We are excited to introduce Mercury, the first commercial-grade diffusion large language model (dLLM)! dLLMs push the frontier of intelligence and speed with parallel, coarse-to-fine text generation.

thumb_up_off_alt5,5K

chat_bubble_outline225

repeat996

shareShare

will brown

@willccbb

4 months ago

proof that RL is the future

thumb_up_off_alt126

chat_bubble_outline4

repeat1

shareShare

We believe formal math is the future. 🔥Introducing Kimina-Prover Preview, a Numina & Kimi.ai collaboration, the first large formal reasoning model for Lean 4, achieving 80.78% miniF2F. github.com/MoonshotAI/Kim…

We believe formal math is the future.
🔥Introducing Kimina-Prover Preview, a Numina &
<a href="/Kimi_Moonshot/">Kimi.ai</a> collaboration, the first large formal reasoning model for Lean 4, achieving 80.78% miniF2F.
github.com/MoonshotAI/Kim…

thumb_up_off_alt759

chat_bubble_outline29

repeat134

shareShare

Jia Li

@jiali52524397

3 months ago

Combinatorics are the two last problems unsolved by AlphaProof at last year's IMO。 Introducing CombiBench Kimi.ai , a benchmark focusing on combinatorics problems ! 🔥 🏆moonshotai.github.io/CombiBench/ 📘Dataset -> huggingface.co/datasets/AI-MO…

Combinatorics are the two last problems unsolved by AlphaProof at last year's IMO。
Introducing CombiBench <a href="/Kimi_Moonshot/">Kimi.ai</a> , a benchmark focusing on combinatorics problems ! 🔥
🏆moonshotai.github.io/CombiBench/
📘Dataset -> huggingface.co/datasets/AI-MO…

thumb_up_off_alt136

chat_bubble_outline2

repeat34

shareShare

Zhihong Shao

@zhs05232838

3 months ago

We just released DeepSeek-Prover V2. - Solves nearly 90% of miniF2F problems - Significantly improves the SoTA performance on the PutnamBench - Achieves a non-trivial pass rate on AIME 24 & 25 problems in their formal version Github: github.com/deepseek-ai/De…

thumb_up_off_alt2,2K

chat_bubble_outline74

repeat329

shareShare

will brown

@willccbb

3 months ago

wow big day for multi-turn GRPO incredible writeup too

thumb_up_off_alt407

chat_bubble_outline10

repeat40

shareShare

Carlo

@carlobaronio

3 months ago

Had fun training Kevin! We explored multi-turn training to help models learn longer-horizon dynamics, and kernel generation seemed a very nice environment to try out our ideas—a step closer to coding agents! 🚀 It turns out that maybe you don't need insanely long context...

thumb_up_off_alt41

chat_bubble_outline4

repeat11

shareShare

Binyuan Hui

@huybery

3 months ago

Congrats, great work! Also happy to see it's trained on QwQ!

thumb_up_off_alt201

chat_bubble_outline3

repeat11

shareShare

vLLM

@vllm_project

3 months ago

Great work! We love how vLLM is used in the rollout process with with offloading the engine to CPU and give the GPU back to the kernel to be benchmarked! This is a small feature we implemented to make RLHF smoother with vLLM.

Great work! We love how <a href="/vllm_project/">vLLM</a> is used in the rollout process with with offloading the engine to CPU and give the GPU back to the kernel to be benchmarked! This is a small feature we implemented to make RLHF smoother with vLLM.

thumb_up_off_alt183

chat_bubble_outline1

repeat18

shareShare

Scott Wu

@scottwu46

3 months ago

Some really great work from Carlo Pietro Marsella Ben Pan! Still a long horizon ahead for multi-turn agents :)

thumb_up_off_alt43

chat_bubble_outline2

repeat4

shareShare

Daniel Litt

@littmath

3 months ago

first math pope?

thumb_up_off_alt2,2K

chat_bubble_outline57

repeat312

shareShare

Morph

@morph_labs

a month ago

We are excited to announce Trinity, an autoformalization system for verified superintelligence that we have developed at Morph. We have used it to automatically formalize in Lean a classical result of de Bruijn that the abc conjecture is true almost always.

We are excited to announce Trinity, an autoformalization system for verified superintelligence that we have developed at <a href="/morph_labs/">Morph</a>. We have used it to automatically formalize in Lean a classical result of de Bruijn that the abc conjecture is true almost always.

thumb_up_off_alt375

chat_bubble_outline10

repeat50

shareShare

MiniMax (official)

@minimax__ai

a month ago

Day 1/5 of #MiniMaxWeek: We’re open-sourcing MiniMax-M1, our latest LLM — setting new standards in long-context reasoning. - World’s longest context window: 1M-token input, 80k-token output - State-of-the-art agentic use among open-source models - RL at unmatched efficiency:

thumb_up_off_alt1,1K

chat_bubble_outline55

repeat236

shareShare

Sakana AI

@sakanaailabs

a month ago

Introducing Reinforcement-Learned Teachers (RLTs): Transforming how we teach LLMs to reason with reinforcement learning (RL). Blog: sakana.ai/rlt Paper: arxiv.org/abs/2506.08388 Traditional RL focuses on “learning to solve” challenging problems with expensive LLMs and

thumb_up_off_alt947

chat_bubble_outline21

repeat221

shareShare

Jia Li

@jiali52524397

15 days ago

Happy to introduce Kimina-Prover-72B ! Reaching 92.2% on miniF2F using Test time RL. It can solve IMO problems using more than 500 lines of Lean 4 code ! Check our blog post here: huggingface.co/blog/AI-MO/kim… And play with our demo ! demo.projectnumina.ai

thumb_up_off_alt274

chat_bubble_outline7

repeat47

shareShare

Yong Lin

@yong18850571

10 days ago

(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B

thumb_up_off_alt224

chat_bubble_outline6

repeat77

shareShare

Kaiyu Yang

@kaiyuyang4

10 days ago

Our Goedel-Prover-V2 doubled the SOTA Pass@32 performance on PutnamBench with a 20x smaller model, making it the strongest open-source theorem prover to date!

thumb_up_off_alt89

chat_bubble_outline0

repeat14

shareShare

Noam Brown

@polynoamial

6 days ago

Typically for these AI results, like in Go/Dota/Poker/Diplomacy, researchers spend years making an AI that masters one narrow domain and does little else. But this isn’t an IMO-specific model. It’s a reasoning LLM that incorporates new experimental general-purpose techniques.

thumb_up_off_alt709

chat_bubble_outline3

repeat39

shareShare

Jerry Tworek

@millionint

6 days ago

Why am I excited about IMO results we just published: - we did very little IMO-specific work, we just keep training general models - all natural language proofs - no evaluation harness We needed a new research breakthrough and Alexander Wei and team delivered

thumb_up_off_alt1,1K

chat_bubble_outline45

repeat112

shareShare

Alexander Wei

@alexwei_

4 days ago

On IMO P6 (without going into too much detail about our setup), the model "knew" it didn't have a correct solution. The model knowing when it didn't know was one of the early signs of life that made us excited about the underlying research direction!

thumb_up_off_alt1,1K

chat_bubble_outline79

repeat162

shareShare