Mislav Balunović

I am Senior Research Scientist at Google DeepMind. Previously, I’ve created MathArena, a benchmark for evaluating LLMs on latest math competitions which is now used by leading labs such as Google DeepMind, xAI and Microsoft. I received my PhD at ETH Zurich during which my work has been featured in media such as Ars Technica, Forbes and Wired. During high school I won gold medal at IMO and silver medals at IOI. I also spent time in industry working on securing AI agents at a startup, and doing internships at Twitter, Facebook and SigOpt.

news

Sep 01, 2025	I joined Google DeepMind as a Senior Research Scientist.
May 07, 2025	My work on AgentDojo, a benchmark for AI agent safety, has recently won first prize in SafeBench competition (50 000 USD)
May 02, 2025	MathConstruct: Challenging LLM Reasoning with Constructive Proofs has been accepted to ICML 2025!
Apr 15, 2025	I am presenting 2 papers at ICLR 2025 in Singapore: Language Models are Advanced Anonymizers and MathConstruct: Challenging LLM Reasoning with Constructive Proofs.
Feb 07, 2025	We released matharena.ai, a website for evaluating LLMs on latest math competitions.

selected publications

arXiv

MathArena: Evaluating LLMs on Uncontaminated Math Competitions

Mislav Balunović, Jasper Dekoninck, Ivo Petrov, Nikola Jovanović, and Martin Vechev

arXiv, 2025

arXiv
arXiv

Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad

Ivo Petrov, Jasper Dekoninck, Lyuben Baltadzhiev, Maria Drencheva, Kristian Minchev, Mislav Balunović, Nikola Jovanović, and Martin Vechev

arXiv, 2025

arXiv
ICML

MathConstruct: Challenging LLM Reasoning with Constructive Proofs

Mislav Balunovic, Jasper Dekoninck, Nikola Jovanovic, Ivo Petrov, and Martin T. Vechev

In ICML, 2025

arXiv