Anastasios Nikolas Angelopoulos (@ml_angelopoulos) Twitter Tweets • TwiCopy

Anastasios Nikolas Angelopoulos

@ml_angelopoulos

+ Follow

Building Chatbot Arena.

Black-box statistics, model testing.

@Berkeley_EECS Ph.D., former student researcher @GoogleDeepMind and @stanford_ee alum.

ID: 1170098132283035648

linkhttps://people.eecs.berkeley.edu/~angelopoulos/ calendar_today06-09-2019 22:15:34

1,1K Tweet

4,4K Followers

1,1K Following

Anastasios Nikolas Angelopoulos

@ml_angelopoulos

6 months ago

After many months of development, LMArena has transitioned to a new user experience. It's way better, check it out, and congrats to the team that made it possible. :)

thumb_up_off_alt70

chat_bubble_outline1

repeat3

shareShare

Wei-Lin Chiang

@infwinston

6 months ago

Super excited to share the NEW LMArena is now live. Huge shoutout to the team for all the hard work! Check it out and let us know your feedback.

thumb_up_off_alt21

chat_bubble_outline1

repeat1

shareShare

Arena is building one of the most valuable data assets in AI: a real-time, community-driven record of how models perform in the wild. This ground-level signal has quickly become an indispensible ingredient to how frontier labs improve their products. Lightspeed is thrilled

thumb_up_off_alt31

chat_bubble_outline1

repeat1

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

6 months ago

Breaking: Claude Opus 4 jumps to #1 in WebDev Arena! A strong comeback from Anthropic - Opus 4 and Sonnet 4 now on top of the chart, surpassing previous Claude 3.7 and matching Gemini 2.5 Pro. Massive congrats to Anthropic🔥

Breaking: Claude Opus 4 jumps to #1 in WebDev Arena!

A strong comeback from <a href="/AnthropicAI/">Anthropic</a> - Opus 4 and Sonnet 4 now on top of the chart, surpassing previous Claude 3.7 and matching Gemini 2.5 Pro.

Massive congrats to <a href="/AnthropicAI/">Anthropic</a>🔥

thumb_up_off_alt697

chat_bubble_outline19

repeat67

shareShare

Lars Lindemann

@larslindemann2

6 months ago

I gave an in-depth tutorial on "Formal Verification and Control with Conformal Prediction" at KTH today 🚀 Since I got positive feedback, I wanted to share the presentation, in the hope that others can also benefit from it 🙂 Find the recording here: youtu.be/kfPBjaMCXmM?si…

thumb_up_off_alt41

chat_bubble_outline0

repeat8

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

6 months ago

DeepSeek R1-0528 has landed in the Arena! 🐳 Rumblings say this version has major improvements in reasoning and output, but let's see what you think.🫵 Congrats to the DeepSeek team on this release update 👏

thumb_up_off_alt333

chat_bubble_outline5

repeat33

shareShare

Laude Ventures

@laudeventures

6 months ago

Proud to support Anastasios Nikolas Angelopoulos, Wei-Lin Chiang , and Ion Stoica as lmarena.ai lays the groundwork for a more rigorous and accountable AI ecosystem. newblog.lmarena.ai/new-lmarena/

thumb_up_off_alt18

chat_bubble_outline0

repeat4

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

6 months ago

Welcome to the Image Arena: FLUX.1 Kontext! 🖼️ 🎨 You'll find that FLUX.1 Kontext Pro can generate AND edit images. Congrats to Black Forest Labs on this exciting release. 🌲👏🌲 Check it out in the Arena, and get voting!

thumb_up_off_alt105

chat_bubble_outline5

repeat17

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

6 months ago

Exciting update today: Anthropic’s Claude 4 Opus jumps into the Top 4🔥 Claude Opus 4 ranks #4 overall (text) ↳ Tied #1🏆 in Coding, Hard Prompts, Writing, Longer Query ↳ Tied #1🏆 in WebDev Arena Claude Sonnet 4 hits #7 overall (text) ↳ #2 in Coding, #1 in Longer Query

Exciting update today: <a href="/AnthropicAI/">Anthropic</a>’s Claude 4 Opus jumps into the Top 4🔥

Claude Opus 4 ranks #4 overall (text)
↳ Tied #1🏆 in Coding, Hard Prompts, Writing, Longer Query
↳ Tied #1🏆 in WebDev Arena

Claude Sonnet 4 hits #7 overall (text)
↳ #2 in Coding, #1 in Longer Query

thumb_up_off_alt271

chat_bubble_outline12

repeat22

shareShare

Václav Voráček

@vaclavvoracekcz

6 months ago

With Francesco Orabona, We propose a new algorithm for constructing confidence intervals for means of bounded r.vs using "testing by betting" framework. It performs remarkably well even in the challenging, very small sample regime. (and of course, it is great in the large sample one)

thumb_up_off_alt46

chat_bubble_outline1

repeat4

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

6 months ago

Image Editing just got real on LMArena 🖼️✨ Introducing Image Edit Arena: where AI editing models go head-to-head on your images. Upload, edit, vote. It's that simple. Who edits it best? You decide🫵 Learn how it works in thread 🧵

thumb_up_off_alt195

chat_bubble_outline5

repeat25

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

6 months ago

Exciting news: OpenAI’s GPT-Image-1 takes the #1 spot in the Text-to-Image Arena! 🖼️🏆 ➤ Outperforms Google’s Imagen-3.0 by 50+ points ➤ Major leap over DALL·E 3 Huge congrats to OpenAI! 👏

Exciting news: <a href="/OpenAI/">OpenAI</a>’s GPT-Image-1 takes the #1 spot in the Text-to-Image Arena! 🖼️🏆

➤ Outperforms Google’s Imagen-3.0 by 50+ points
➤ Major leap over DALL·E 3

Huge congrats to <a href="/OpenAI/">OpenAI</a>! 👏

thumb_up_off_alt307

chat_bubble_outline11

repeat21

shareShare

johnnn

@johnnnavent

6 months ago

Hello friends, at lmarena.ai we're on the lookout for designers who are obsessed with craft, comfortable with complexity and have a track record of creating interfaces people love. If you're interested in helping us mold this product and brand into something special, send me a

thumb_up_off_alt7

chat_bubble_outline1

repeat2

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

6 months ago

🚨Breaking: New Gemini-2.5-Pro (06-05) takes the #1 spot across all Arenas again! 🥇 #1 in Text, Vision, WebDev 🥇 #1 in Hard, Coding, Math, Creative, Multi-turn, Instruction Following, and Long Queries categories Huge congrats Google DeepMind!

thumb_up_off_alt1,1K

chat_bubble_outline20

repeat134

shareShare

Mir Miroyan

@mirmiroyan

6 months ago

We release Search Arena 🌐 — the first large-scale (24k+) dataset of in-the-wild user interactions with search-augmented LLMs. We also share a comprehensive report on user preferences and model performance in the search-enabled setting. Paper, dataset, and code in 🧵

thumb_up_off_alt217

chat_bubble_outline5

repeat39

shareShare

TBPN

@tbpn

6 months ago

We asked Anastasios Nikolas Angelopoulos about the thesis behind LMArena. "Benchmarking is entering a new age. The way people are using AI is so broad that you could never annotate all of it with datasets." "There's like 1M different things that you should be evaluating. Not just image

thumb_up_off_alt25

chat_bubble_outline0

repeat2

shareShare

Tsung-Han (Patrick) Wu @ ICLR’25

@tsunghan_wu

6 months ago

Search-augmented LLMs 🌐 are changing how people ask, judge, and trust. We dropped 24k+ real human battles ⚔️ across top models. Wanna know what people actually ask and what they want in return? Check out the paper + dataset 👇

thumb_up_off_alt12

chat_bubble_outline0

repeat2

shareShare

Léo Andéol

@leoandeol

6 months ago

Very happy to announce that our work on Conformal Object Detection with Sequential Conformal Risk Control (able to control risks of multiple parameters!) is now available on ArXiv at: arxiv.org/abs/2505.24038

thumb_up_off_alt4

chat_bubble_outline1

repeat1

shareShare

Anastasios Nikolas Angelopoulos

Anastasios Nikolas Angelopoulos

Wei-Lin Chiang

Bucky Moore

lmarena.ai (formerly lmsys.org)

Lars Lindemann

lmarena.ai (formerly lmsys.org)

Laude Ventures

lmarena.ai (formerly lmsys.org)

lmarena.ai (formerly lmsys.org)

Václav Voráček

lmarena.ai (formerly lmsys.org)

lmarena.ai (formerly lmsys.org)

johnnn

lmarena.ai (formerly lmsys.org)

Mir Miroyan

TBPN

Tsung-Han (Patrick) Wu @ ICLR’25

Léo Andéol