OpenAI introduced HealthBench on May 13, 2025, a new dataset for evaluating AI models in healthcare. The goal is to create a 24/7 AI doctor accessible via a pocket device. This initiative assesses AI's ability to provide reliable medical advice. HealthBench is an open-source dataset that benchmarks AI models against physician-written rubrics. OpenAI's o3 reasoning model leads with a 60% score. Grok follows at 54%, and Google's Gemini 2.5 Pro scores 52%. The vision of a 24/7 AI doctor could revolutionize healthcare accessibility, especially in remote areas. However, the resource-intensive nature of AI models may limit accessibility. Ethical concerns about data privacy and misinformation also exist.
Openai Launches Healthbench to Evaluate Ai Models in Healthcare
Edited by: Veronika Nazarova
Read more news on this topic:
Did you find an error or inaccuracy?
We will consider your comments as soon as possible.