SRI Lab (@the_sri_lab) Twitter Tweets • TwiCopy

SRI Lab

@the_sri_lab

+ Follow

ID: 1051516728553996288

linkhttps://www.sri.inf.ethz.ch/ calendar_today14-10-2018 16:54:59

189 Tweet

714 Followers

166 Following

Mislav Balunović

@mbalunovic

10 months ago

MathArena results for HMMT Feb 2025 are out, showing that high school math competitions are still far from being solved by frontier LLMs, with only o3-mini crossing the 50% mark!

thumb_up_off_alt23

chat_bubble_outline1

repeat3

shareShare

How good are LLMs at producing constructive proofs? In our latest paper we introduce MathConstruct, a benchmark consisting of challenging olympiad-level problems where solution requires proof by construction.

thumb_up_off_alt14

chat_bubble_outline1

repeat4

shareShare

Mislav Balunović

@mbalunovic

9 months ago

Can LLMs actually solve hard math problems? Given the strong performance at AIME, we now go to the next tier: our MathArena team has conducted a detailed evaluation using the recent 2025 USA Math Olympiad. The results are… bad: all models scored less than 5%!

thumb_up_off_alt503

chat_bubble_outline18

repeat85

shareShare

Mislav Balunović

@mbalunovic

9 months ago

Big update to our MathArena USAMO evaluation: Gemini 2.5 Pro, which was released *the same day* as our benchmark, is the first model to achieve non-trivial amount of points (24.4%). The speed of progress is really mind-blowing.

thumb_up_off_alt1,1K

chat_bubble_outline34

repeat145

shareShare

SRI Lab

@the_sri_lab

7 months ago

Check out this recent work from our lab showing that benign-looking LLM's can hide backdoors that activate upon finetuning!

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare