Alexandr Wang (@alexandr_wang) 's Twitter Profile
Alexandr Wang

@alexandr_wang

ceo at @scale_ai. rational in the fullness of time

ID: 615818451

linkhttps://scale.com calendar_today23-06-2012 02:46:22

2,2K Tweet

241,241K Followers

803 Following

Alexandr Wang (@alexandr_wang) 's Twitter Profile Photo

AI improvement at the labs has evolved from a guessing game to one driven by precision-targeted fixes to identified failure modes. Scale AI's SEAL research benchmarks and evaluation platform is making this possible. Check out coverage by Will Knight wired.com/story/this-too…

Alexandr Wang (@alexandr_wang) 's Twitter Profile Photo

🚨 Narrative Violation—DeepSeek V3 is a competitive, but NOT top model. SEAL leaderboards have been updated with DeepSeek V3 (Mar 2025). - 8th on Humanity’s Last Exam (text-only). - 12th on MultiChallenge (multi-turn). View the full rankings: scale.com/leaderboard

🚨 Narrative Violation—DeepSeek V3 is a competitive, but NOT top model.

SEAL leaderboards have been updated with DeepSeek V3 (Mar 2025).

- 8th on Humanity’s Last Exam (text-only).
- 12th on MultiChallenge (multi-turn).

View the full rankings: scale.com/leaderboard
Alexandr Wang (@alexandr_wang) 's Twitter Profile Photo

Congrats to OpenAI on the successful launch of their GPT-4.1 model :) Great to see their team utilize the Scale AI MultiChallenge benchmark to measure multi-turn instruction following

Congrats to <a href="/OpenAI/">OpenAI</a> on the successful launch of their GPT-4.1 model :)

Great to see their team utilize the <a href="/scale_AI/">Scale AI</a> MultiChallenge benchmark to measure multi-turn instruction following
Alexandr Wang (@alexandr_wang) 's Twitter Profile Photo

🚨 OpenAI has launched o3 and o4-mini! 🎉 o3 is absolutely dominating the SEAL leaderboard with #1 rankings in: 🥇: HLE 🥇: Multichallenge (multi-turn) 🥇: MASK (honesty under pressure) 🥇: ENIGMA (puzzle solving) Congrats Sam Altman Mark Chen & team 🔗: scale.com/leaderboard

🚨 <a href="/OpenAI/">OpenAI</a> has launched o3 and o4-mini! 🎉

o3 is absolutely dominating the SEAL leaderboard with #1 rankings in:

🥇: HLE
🥇: Multichallenge (multi-turn)
🥇: MASK (honesty under pressure)
🥇: ENIGMA (puzzle solving)

Congrats <a href="/sama/">Sam Altman</a> <a href="/markchen90/">Mark Chen</a> &amp; team

🔗: scale.com/leaderboard
Alexandr Wang (@alexandr_wang) 's Twitter Profile Photo

OpenAI o3 is a genuine meaningful step forward for the industry. Emergent agentic tool use working seamlessly via scaling RL is a big breakthrough. It is genuinely incredible how consistently OpenAI delivers new miracles. Kudos to Sam Altman and the team!

Alexandr Wang (@alexandr_wang) 's Twitter Profile Photo

Good discussion on AI leadership with President Trump and my colleagues from across the AI industry 🤝 Thanks to our Saudi hosts as well!

Good discussion on AI leadership with President Trump and my colleagues from across the AI industry 🤝

Thanks to our Saudi hosts as well!