Anastasios Nikolas Angelopoulos (@ml_angelopoulos) 's Twitter Profile
Anastasios Nikolas Angelopoulos

@ml_angelopoulos

Building Chatbot Arena.

Black-box statistics, model testing.

@Berkeley_EECS Ph.D., former student researcher @GoogleDeepMind and @stanford_ee alum.

ID: 1170098132283035648

linkhttps://people.eecs.berkeley.edu/~angelopoulos/ calendar_today06-09-2019 22:15:34

1,1K Tweet

4,4K Followers

1,1K Following

Anastasios Nikolas Angelopoulos (@ml_angelopoulos) 's Twitter Profile Photo

After many months of development, LMArena has transitioned to a new user experience. It's way better, check it out, and congrats to the team that made it possible. :)

Wei-Lin Chiang (@infwinston) 's Twitter Profile Photo

Super excited to share the NEW LMArena is now live. Huge shoutout to the team for all the hard work! Check it out and let us know your feedback.

Bucky Moore (@buckymoore) 's Twitter Profile Photo

Arena is building one of the most valuable data assets in AI: a real-time, community-driven record of how models perform in the wild. This ground-level signal has quickly become an indispensible ingredient to how frontier labs improve their products. Lightspeed is thrilled

lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

Breaking: Claude Opus 4 jumps to #1 in WebDev Arena! A strong comeback from Anthropic - Opus 4 and Sonnet 4 now on top of the chart, surpassing previous Claude 3.7 and matching Gemini 2.5 Pro. Massive congrats to Anthropic🔥

Breaking: Claude Opus 4 jumps to #1 in WebDev Arena!

A strong comeback from <a href="/AnthropicAI/">Anthropic</a> - Opus 4 and Sonnet 4 now on top of the chart, surpassing previous Claude 3.7 and matching Gemini 2.5 Pro.

Massive congrats to <a href="/AnthropicAI/">Anthropic</a>🔥
Lars Lindemann (@larslindemann2) 's Twitter Profile Photo

I gave an in-depth tutorial on "Formal Verification and Control with Conformal Prediction" at KTH today 🚀 Since I got positive feedback, I wanted to share the presentation, in the hope that others can also benefit from it 🙂 Find the recording here: youtu.be/kfPBjaMCXmM?si…

lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

DeepSeek R1-0528 has landed in the Arena! 🐳 Rumblings say this version has major improvements in reasoning and output, but let's see what you think.🫵 Congrats to the DeepSeek team on this release update 👏

DeepSeek R1-0528 has landed in the Arena! 🐳 
Rumblings say this version has major improvements in reasoning and output, but let's see what you think.🫵

Congrats to the <a href="/deepseek_ai/">DeepSeek</a> team on this release update 👏
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

Welcome to the Image Arena: FLUX.1 Kontext! 🖼️ 🎨 You'll find that FLUX.1 Kontext Pro can generate AND edit images. Congrats to Black Forest Labs on this exciting release. 🌲👏🌲 Check it out in the Arena, and get voting!

lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

Exciting update today: Anthropic’s Claude 4 Opus jumps into the Top 4🔥 Claude Opus 4 ranks #4 overall (text) ↳ Tied #1🏆 in Coding, Hard Prompts, Writing, Longer Query ↳ Tied #1🏆 in WebDev Arena Claude Sonnet 4 hits #7 overall (text) ↳ #2 in Coding, #1 in Longer Query

Exciting update today: <a href="/AnthropicAI/">Anthropic</a>’s Claude 4 Opus jumps into the Top 4🔥

Claude Opus 4 ranks #4 overall (text)
↳ Tied #1🏆 in Coding, Hard Prompts, Writing, Longer Query
↳ Tied #1🏆 in WebDev Arena

Claude Sonnet 4 hits #7 overall (text)
↳ #2 in Coding, #1 in Longer Query
Václav Voráček (@vaclavvoracekcz) 's Twitter Profile Photo

With Francesco Orabona, We propose a new algorithm for constructing confidence intervals for means of bounded r.vs using "testing by betting" framework. It performs remarkably well even in the challenging, very small sample regime. (and of course, it is great in the large sample one)

lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

Image Editing just got real on LMArena 🖼️✨ Introducing Image Edit Arena: where AI editing models go head-to-head on your images. Upload, edit, vote. It's that simple. Who edits it best? You decide🫵 Learn how it works in thread 🧵

Image Editing just got real on LMArena 🖼️✨

Introducing Image Edit Arena: where AI editing models go head-to-head on your images. Upload, edit, vote. It's that simple.

Who edits it best? You decide🫵

Learn how it works in thread 🧵
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

Exciting news: OpenAI’s GPT-Image-1 takes the #1 spot in the Text-to-Image Arena! 🖼️🏆 ➤ Outperforms Google’s Imagen-3.0 by 50+ points ➤ Major leap over DALL·E 3 Huge congrats to OpenAI! 👏

Exciting news: <a href="/OpenAI/">OpenAI</a>’s GPT-Image-1 takes the #1 spot in the Text-to-Image Arena! 🖼️🏆

➤ Outperforms Google’s Imagen-3.0 by 50+ points
➤ Major leap over DALL·E 3

Huge congrats to <a href="/OpenAI/">OpenAI</a>! 👏
johnnn (@johnnnavent) 's Twitter Profile Photo

Hello friends, at lmarena.ai we're on the lookout for designers who are obsessed with craft, comfortable with complexity and have a track record of creating interfaces people love. If you're interested in helping us mold this product and brand into something special, send me a

lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

🚨Breaking: New Gemini-2.5-Pro (06-05) takes the #1 spot across all Arenas again! 🥇 #1 in Text, Vision, WebDev 🥇 #1 in Hard, Coding, Math, Creative, Multi-turn, Instruction Following, and Long Queries categories Huge congrats Google DeepMind!

🚨Breaking: New Gemini-2.5-Pro (06-05) takes the #1 spot across all Arenas again!

🥇 #1 in Text, Vision, WebDev
🥇 #1 in Hard, Coding, Math, Creative, Multi-turn, Instruction Following, and Long Queries categories

Huge congrats <a href="/GoogleDeepMind/">Google DeepMind</a>!
TBPN (@tbpn) 's Twitter Profile Photo

We asked Anastasios Nikolas Angelopoulos about the thesis behind LMArena. "Benchmarking is entering a new age. The way people are using AI is so broad that you could never annotate all of it with datasets." "There's like 1M different things that you should be evaluating. Not just image

Tsung-Han (Patrick) Wu @ ICLR’25 (@tsunghan_wu) 's Twitter Profile Photo

Search-augmented LLMs 🌐 are changing how people ask, judge, and trust. We dropped 24k+ real human battles ⚔️ across top models. Wanna know what people actually ask and what they want in return? Check out the paper + dataset 👇

Léo Andéol (@leoandeol) 's Twitter Profile Photo

Very happy to announce that our work on Conformal Object Detection with Sequential Conformal Risk Control (able to control risks of multiple parameters!) is now available on ArXiv at: arxiv.org/abs/2505.24038