Openai Launches Healthbench to Evaluate Ai Models in Healthcare

Edited by: Veronika Nazarova

OpenAI introduced HealthBench on May 13, 2025, a new dataset for evaluating AI models in healthcare. The goal is to create a 24/7 AI doctor accessible via a pocket device. This initiative assesses AI's ability to provide reliable medical advice. HealthBench is an open-source dataset that benchmarks AI models against physician-written rubrics. OpenAI's o3 reasoning model leads with a 60% score. Grok follows at 54%, and Google's Gemini 2.5 Pro scores 52%. The vision of a 24/7 AI doctor could revolutionize healthcare accessibility, especially in remote areas. However, the resource-intensive nature of AI models may limit accessibility. Ethical concerns about data privacy and misinformation also exist.

Did you find an error or inaccuracy?

We will consider your comments as soon as possible.

Openai Launches Healthbench to Evaluate Ai... | Gaya One