
Anastasios Nikolas Angelopoulos
@ml_angelopoulos
Building Chatbot Arena.
Black-box statistics, model testing.
@Berkeley_EECS Ph.D., former student researcher @GoogleDeepMind and @stanford_ee alum.
ID: 1170098132283035648
https://people.eecs.berkeley.edu/~angelopoulos/ 06-09-2019 22:15:34
1,1K Tweet
4,4K Followers
1,1K Following



Arena is building one of the most valuable data assets in AI: a real-time, community-driven record of how models perform in the wild. This ground-level signal has quickly become an indispensible ingredient to how frontier labs improve their products. Lightspeed is thrilled




Proud to support Anastasios Nikolas Angelopoulos, Wei-Lin Chiang , and Ion Stoica as lmarena.ai lays the groundwork for a more rigorous and accountable AI ecosystem. newblog.lmarena.ai/new-lmarena/

Welcome to the Image Arena: FLUX.1 Kontext! 🖼️ 🎨 You'll find that FLUX.1 Kontext Pro can generate AND edit images. Congrats to Black Forest Labs on this exciting release. 🌲👏🌲 Check it out in the Arena, and get voting!


With Francesco Orabona, We propose a new algorithm for constructing confidence intervals for means of bounded r.vs using "testing by betting" framework. It performs remarkably well even in the challenging, very small sample regime. (and of course, it is great in the large sample one)



Hello friends, at lmarena.ai we're on the lookout for designers who are obsessed with craft, comfortable with complexity and have a track record of creating interfaces people love. If you're interested in helping us mold this product and brand into something special, send me a

🚨Breaking: New Gemini-2.5-Pro (06-05) takes the #1 spot across all Arenas again! 🥇 #1 in Text, Vision, WebDev 🥇 #1 in Hard, Coding, Math, Creative, Multi-turn, Instruction Following, and Long Queries categories Huge congrats Google DeepMind!


We asked Anastasios Nikolas Angelopoulos about the thesis behind LMArena. "Benchmarking is entering a new age. The way people are using AI is so broad that you could never annotate all of it with datasets." "There's like 1M different things that you should be evaluating. Not just image

