Mislav Balunović

Researcher at ETH Zurich and INSAIT

prof_pic.jpg

I am an AI researcher currently focusing on reasoning, AI agents and AI applications in math. I’ve created MathArena, a benchmark for evaluating LLMs on latest math competitions which is now used by leading labs such as Google DeepMind, xAI and Microsoft. Previously, I received my PhD at ETH Zurich during which my work has been featured in media such as Ars Technica, Forbes and Wired. During high school I won gold medal at IMO and silver medals at IOI. I also spent time in industry working on securing AI agents at a startup, and doing internships at Twitter, Facebook and SigOpt.

Research Interests

My current research interests are the following:

  • AI for mathematics: Given my experience at IMO and IOI, I am interested in lifting the recent improvements in reasoning to more difficult, proof-based mathematical problems. I believe that the right approach starts with reliable evaluation (see MathArena) and leverages both informal chain-of-though reasoning combined with formal solvers used as tools.

  • AI agents for science: Beyond mathematics, I am interested in building general agents that can combine informal reasoning with tool usage such as code execution and formal solvers to solve more general science problems.

  • Safe and reliable AI agents: I am also interested in applying formal methods to guarantee safety of AI agents. I’ve done some initial work pursuing this idea, which you can find here. My earlier work on evaluating AI agent safety has recently won first prize in SafeBench competition (50 000 USD).

news

May 07, 2025 My work on AgentDojo, a benchmark for AI agent safety, has recently won first prize in SafeBench competition (50 000 USD)
May 02, 2025 MathConstruct: Challenging LLM Reasoning with Constructive Proofs has been accepted to ICML 2025!
Apr 15, 2025 I am presenting 2 papers at ICLR 2025 in Singapore: Language Models are Advanced Anonymizers and MathConstruct: Challenging LLM Reasoning with Constructive Proofs.
Feb 07, 2025 We released matharena.ai, a website for evaluating LLMs on latest math competitions.

selected publications

  1. bluff.png
    Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad
    Ivo Petrov, Jasper Dekoninck, Lyuben Baltadzhiev, Maria Drencheva, Kristian Minchev, Mislav Balunović, Nikola Jovanović, and Martin Vechev
    arXiv, 2025
  2. MathConstruct: Challenging LLM Reasoning with Constructive Proofs
    Mislav Balunovic, Jasper Dekoninck, Nikola Jovanovic, Ivo Petrov, and Martin T. Vechev
    In ICML, 2025
  3. agentsformal.png
    AI Agents with Formal Security Guarantees
    Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Martin Vechev
    In ICML 2024 Next Generation of AI Safety Workshop, 2024