Wei-Lin Chiang (@infwinston) 's Twitter Profile
Wei-Lin Chiang

@infwinston

CS PhD student at UC Berkeley. Building LMArena @lmarena_ai @lmsysorg

ID: 490518336

linkhttps://infwinston.github.io/ calendar_today12-02-2012 16:49:32

563 Tweet

3,3K Followers

893 Following

Aydin Senkut (@asenkut) 's Twitter Profile Photo

Super excited to partner w Ion Stoica, Anastasios Nikolas Angelopoulos & Wei-Lin Chiang and be part of lmarena.ai (formerly lmsys.org) mega seed round as it has rapidly transcended from an ambitious academic project to a critical cornerstone of AI model evaluations, loyally used by all the major players. In Google IO

Wei-Lin Chiang (@infwinston) 's Twitter Profile Photo

Super excited to share the NEW LMArena is now live. Huge shoutout to the team for all the hard work! Check it out and let us know your feedback.

Robert Weber (@robertnweber) 's Twitter Profile Photo

Huge launch from the LMArena team today 🎉 They’ve rebuilt the platform from the ground up with faster UI, mobile support, multimodal evals, and a cleaner leaderboard experience. All open and community-driven. And they're hiring! jobs.ashbyhq.com/lmarena

lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

Breaking: Claude Opus 4 jumps to #1 in WebDev Arena! A strong comeback from Anthropic - Opus 4 and Sonnet 4 now on top of the chart, surpassing previous Claude 3.7 and matching Gemini 2.5 Pro. Massive congrats to Anthropic🔥

Breaking: Claude Opus 4 jumps to #1 in WebDev Arena!

A strong comeback from <a href="/AnthropicAI/">Anthropic</a> - Opus 4 and Sonnet 4 now on top of the chart, surpassing previous Claude 3.7 and matching Gemini 2.5 Pro.

Massive congrats to <a href="/AnthropicAI/">Anthropic</a>🔥
Anjney Midha 🇺🇸 (@anjneymidha) 's Twitter Profile Photo

1/ Humanity doesn't need more AI benchmarks. We need real time, real world, continuous testing of AI systems I sat down with Ion Stoica Wei-Lin Chiang Anastasios Nikolas Angelopoulos to unpack what lmarena.ai is building and why its critical for AI reliability

a16z (@a16z) 's Twitter Profile Photo

The future of AI evaluation: real-world feedback, from real users. lmarena.ai makes that possible: models tested side by side, in public, and voted on by the people who use them. Hear how it started — and why human preference is the foundation of reliable AI in the full

lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

Exciting news: OpenAI’s GPT-Image-1 takes the #1 spot in the Text-to-Image Arena! 🖼️🏆 ➤ Outperforms Google’s Imagen-3.0 by 50+ points ➤ Major leap over DALL·E 3 Huge congrats to OpenAI! 👏

Exciting news: <a href="/OpenAI/">OpenAI</a>’s GPT-Image-1 takes the #1 spot in the Text-to-Image Arena! 🖼️🏆

➤ Outperforms Google’s Imagen-3.0 by 50+ points
➤ Major leap over DALL·E 3

Huge congrats to <a href="/OpenAI/">OpenAI</a>! 👏
johnnn (@johnnnavent) 's Twitter Profile Photo

Hello friends, at lmarena.ai we're on the lookout for designers who are obsessed with craft, comfortable with complexity and have a track record of creating interfaces people love. If you're interested in helping us mold this product and brand into something special, send me a

Andy Konwinski (@andykonwinski) 's Twitter Profile Photo

Today, I’m launching a deeply personal project. I’m betting $100M that we can help computer scientists create more upside impact for humanity. Built for and by researchers, including Jeff Dean & Joelle Pineau on the board, Laude Institute catalyzes research with real-world impact.

Today, I’m launching a deeply personal project. I’m betting $100M that we can help computer scientists create more upside impact for humanity.
Built for and by researchers, including <a href="/JeffDean/">Jeff Dean</a> &amp; <a href="/jpineau1/">Joelle Pineau</a> on the board, <a href="/LaudeInstitute/">Laude Institute</a> catalyzes research with real-world impact.
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

🚨 Breaking News: Grok 4's result is now live! With 4k+ community votes, xAI’s Grok-4 tied for #3 overall in Text Arena — a huge leap from Grok-3. It scores Top-3 across all categories (#1 in Math, #2 in Coding, #3 in Hard Prompts). Detailed analysis in the thread 🧵

🚨 Breaking News: Grok 4's result is now live!

With 4k+ community votes, xAI’s Grok-4 tied for #3 overall in Text Arena — a huge leap from Grok-3. It scores Top-3 across all categories (#1 in Math, #2 in Coding, #3 in Hard Prompts).

Detailed analysis in the thread 🧵
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

We’re delivering a bundle of polish to the LMArena experience, most of them inspired directly by your feedback 💬 Here’s a look at what’s new👇

lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

🧵Top 10 Open Models by Provider Though proprietary models often top the charts, open models are also paired in battle mode, and ranked on our public leaderboards. Here are the top 10 when stacked by top open model by provider. - #1 Kimi K2 (Modified MIT) Kimi.ai - #2

🧵Top 10 Open Models by Provider

Though proprietary models often top the charts, open models are also paired in battle mode, and ranked on our public leaderboards. 

Here are the top 10 when stacked by top open model by provider.

- #1 Kimi K2 (Modified MIT) <a href="/Kimi_Moonshot/">Kimi.ai</a>
- #2
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

Exciting Text-to-Image leaderboard update! Two new Imagen 4.0 models from Google DeepMind just dropped: 🥇 Imagen 4.0 Ultra (v2) ties at #1 with OpenAI’s GPT-Image-1 🥉 Imagen 4.0 (v2) lands strong at #3 Congrats to the Google Imagen team!

Exciting Text-to-Image leaderboard update!

Two new Imagen 4.0 models from <a href="/GoogleDeepMind/">Google DeepMind</a> just dropped:
🥇 Imagen 4.0 Ultra (v2) ties at #1 with <a href="/OpenAI/">OpenAI</a>’s GPT-Image-1
🥉 Imagen 4.0 (v2) lands strong at #3

Congrats to the Google Imagen team!
Aäron van den Oord (@avdnoord) 's Twitter Profile Photo

We updated our Imagen 4 models and Ultra is tied for #1 on the lmarena leaderboard! The models are available in Google AI Studio and the Gemini API - try them out and let us know what you think.

lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

We've been busy lately: new arenas, new models, and new methodologies! So we've created a changelog page where you can track all the updates we make to the leaderboards. In addition to the new Search Arena, and new models like the latest Imagen 4, Grok 4, Kimi K2, Seedream 3 and