🏄 Grow your portfolio even on vacation with InvestingPro | Summer Sale 50% OFFCLAIM SALE

OpenAI introduces BrowseComp, a benchmark for AI internet browsing

Published 2025-04-10, 03:26 p/m
© Reuters.

© Reuters.

Investing.com -- OpenAI has announced the launch of BrowseComp, an open-source benchmark designed to test the ability of AI agents to browse the internet to locate hard-to-find information. The benchmark, which is available in OpenAI's simple evals GitHub repository, consists of 1,266 challenging problems.

BrowseComp is designed to measure the ability of AI agents to locate complex, intertwined information on the internet. AI agents that can gather knowledge by browsing the internet are becoming increasingly valuable. A competent browsing agent should be able to locate information that is difficult to find, potentially requiring the browsing of tens or even hundreds of websites.

The benchmark was created to be both challenging for models and easy to verify. It focuses on questions where the answer is short and there is only one correct answer. This makes grading short answers simple and makes the benchmark easy to use.

The benchmark was created following the guidelines of OpenAI's previous factuality benchmark, SimpleQA. Human trainers were asked to create challenging, fact-seeking questions with single, indisputable, short answers that would not change over time and were supported by evidence. The trainers created questions that were extremely challenging, and three checks were used to ensure that the questions were sufficiently challenging.

The trainers were asked to create tasks that were challenging enough that another person would not be able to solve it within ten minutes. To create challenging questions, trainers were encouraged to start with a fact, and then create an "inverted" question, where the answer is hard to find but easy to verify.

3rd party Ad. Not an offer or recommendation by Investing.com. See disclosure here or remove ads.

The distribution of topics in the BrowseComp benchmark was diverse, with topics ranging from TV shows and movies, to science and technology, art, history, sports, music, video games, geography, and politics.

OpenAI evaluated a range of models on BrowseComp, including models without browsing—GPT‑4o, GPT‑4.5, and OpenAI o1 (medium)—as well as GPT‑4o with browsing and Deep Research, an agent model explicitly trained for persistent web browsing. The results showed that both tool use and reasoning contribute meaningfully to performance on BrowseComp.

Deep Research significantly outperformed all other models, solving around half of the problems. Its ability to autonomously search the web, evaluate and synthesize information from multiple sources, and adapt its search strategy enables it to handle questions that are otherwise intractable.

A key feature of agents is that performance scales with respect to the amount of compute used at inference time. In a similar fashion, additional inference-time compute improves performance on BrowseComp, because the questions require iteratively browsing a large number of websites and combining information.

BrowseComp evaluates how well models can browse the internet to search for hard-to-find information. While BrowseComp does not aim to measure performance on common queries, it measures the ability to find a single targeted piece of information, is easy-to-evaluate, and is challenging for existing browsing agents. OpenAI hopes that open-sourcing BrowseComp drives research on more trustworthy and reliable AI.

This article was generated with the support of AI and reviewed by an editor. For more information see our T&C.

3rd party Ad. Not an offer or recommendation by Investing.com. See disclosure here or remove ads.

Is MSFT truely undervalued?

With MSFT making headlines, investors are asking: Is it truly valued fairly? InvestingPro's advanced AI algorithms have analyzed MSFT alongside thousands of other stocks to uncover hidden gems with massive upside. And guess what? MSFT wasn't at the top of the list.

Unlock ProPicks AI

Latest comments

Loading next article…
Risk Disclosure: Trading in financial instruments and/or cryptocurrencies involves high risks including the risk of losing some, or all, of your investment amount, and may not be suitable for all investors. Prices of cryptocurrencies are extremely volatile and may be affected by external factors such as financial, regulatory or political events. Trading on margin increases the financial risks.
Before deciding to trade in financial instrument or cryptocurrencies you should be fully informed of the risks and costs associated with trading the financial markets, carefully consider your investment objectives, level of experience, and risk appetite, and seek professional advice where needed.
Fusion Media would like to remind you that the data contained in this website is not necessarily real-time nor accurate. The data and prices on the website are not necessarily provided by any market or exchange, but may be provided by market makers, and so prices may not be accurate and may differ from the actual price at any given market, meaning prices are indicative and not appropriate for trading purposes. Fusion Media and any provider of the data contained in this website will not accept liability for any loss or damage as a result of your trading, or your reliance on the information contained within this website.
It is prohibited to use, store, reproduce, display, modify, transmit or distribute the data contained in this website without the explicit prior written permission of Fusion Media and/or the data provider. All intellectual property rights are reserved by the providers and/or the exchange providing the data contained in this website.
Fusion Media may be compensated by the advertisers that appear on the website, based on your interaction with the advertisements or advertisers.
© 2007-2025 - Fusion Media Limited. All Rights Reserved.