Mohamed Osman (@mohamedosmanml) Twitter Tweets • TwiCopy

ARC Prize

a year ago

ARC Prize builds on the legacy of past competitions co-hosted by François Chollet & Lab42. Thank you to Rolf Pfister, Hansueli Jud, and Oliver Schmid for their invaluable contributions this year. Thank you to ARC-AGI evangelists Michael Hodel, Jack Cole, Mohamed Osman,

thumb_up_off_alt31

chat_bubble_outline1

repeat3

shareShare

Mohamed Osman

@mohamedosmanml

a year ago

✈️ Heading to #NeurIPS2024 today! Excited to discuss ARC, reasoning, test-time tuning, other test-time methods, anything and everything. DM me if you’re around and want to chat!

thumb_up_off_alt12

chat_bubble_outline1

repeat4

shareShare

Henry Mao

@calclavia

9 months ago

Andrej Karpathy Agency is like the optimizer. Intelligence is like the outcome of optimization.

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

Akira Yoshiyama ⁂

@yoshiyama_akira

8 months ago

Happy to announce we outperformed OpenAI o1 with a 7B model :) We released two self-improvement methods for verifiable domains in our preliminary paper -->

Happy to announce we outperformed <a href="/OpenAI/">OpenAI</a> o1 with a 7B model :)

We released two self-improvement methods for verifiable domains in our preliminary paper -->

thumb_up_off_alt3,3K

chat_bubble_outline108

repeat254

shareShare

Mohamed Osman

@mohamedosmanml

8 months ago

Honored to be a guest on the infamous MLST podcast again! We discuss our test-time methods, compositionality in LLMs, limitations of VLMs, logic vs perception, efficient adaptation, and more. Machine Learning Street Talk youtu.be/3p0O28W1ZHg

thumb_up_off_alt21

chat_bubble_outline6

repeat9

shareShare

Toby Simonds

@tobyrsimonds

8 months ago

🚀 NEW RESEARCH 🚀 🧠 To push RL further we need a lot more questions. The issue? Current datasets have only a few hundred thousand—and for domains outside math, there's barely anything. Our breakthrough? Turning everyday textbooks into limitless RL training gold 📚✨ A thread on

thumb_up_off_alt112

chat_bubble_outline3

repeat15

shareShare

Jack Cole

@mindsai_jack

7 months ago

ARC-AGI V2 is quite challenging. We're happy to be back at the top at 12.36! Mohamed Osman Michael Hodel Tufalabs ARC Prize

thumb_up_off_alt162

chat_bubble_outline9

repeat12

shareShare

Toby Simonds

@tobyrsimonds

7 months ago

🚀 New paper: LLMs for Engineering: Teaching Models to Design High-Powered Rockets 🚀 We built an environment to allow models to build high powered rockets and show by using RL models can surpass human designs!

thumb_up_off_alt16

chat_bubble_outline2

repeat4

shareShare

Jack Cole

@mindsai_jack

6 months ago

Excited to advance our lead and SoTA score on ARC-AGI-2 (ARC Prize) by 3 points to 15.28. Dries Smit Mohamed Osman Michael Hodel Greg Kamradt Tufalabs kaggle.com/competitions/a…

thumb_up_off_alt137

chat_bubble_outline4

repeat16

shareShare

ARC Prize

@arcprize

6 months ago

New ARC Prize 2025 High Score 15.3% by Jack Cole, Mohamed Osman, Tufalabs

New ARC Prize 2025 High Score

15.3% by <a href="/MindsAI_Jack/">Jack Cole</a>, <a href="/MohamedOsmanML/">Mohamed Osman</a>, <a href="/tufalabs/">Tufalabs</a>

thumb_up_off_alt219

chat_bubble_outline7

repeat15

shareShare

Arnaud Bertrand

@rnaudbertrand

6 months ago

I just read this WSJ article on why Europe's tech scene is so much smaller than the US's and China's. I'm afraid that, like most articles on this topic, it largely misses the mark. Which in itself illustrates a key reason why Europe is lagging behind: when you fail to

thumb_up_off_alt6,6K

chat_bubble_outline671

repeat1,1K

shareShare

Kevin Ellis

@ellisk_kellis

5 months ago

New paper: World models + Program synthesis by Wasu Top Piriyakulkij 1. World modeling on-the-fly by synthesizing programs w/ 4000+ lines of code 2. Learns new environments from minutes of experience 3. Positive score on Montezuma's Revenge 4. Compositional generalization to new environments

thumb_up_off_alt556

chat_bubble_outline14

repeat100

shareShare

Rohan Paul

@rohanpaul_ai

5 months ago

Deep learning alone now cracks 58% of the hidden ARC test after adding on‑the‑fly tuning, proving the paradigm can invent new abstractions during inference. The work shows that a neural network can tackle ARC once the optimizer is treated as part of inference, meaning the model

thumb_up_off_alt66

chat_bubble_outline4

repeat9

shareShare