AI Is Becoming the Doctor to Many—But is it up to the Job?

Mar 31
4 min read

Woman sits in front of a pc screen that is showing a chat she is having with Google Health AMIE about her not feeling well.

Earlier this month, Microsoft quietly took a big step into the future of healthcare when it launched Copilot Health, a feature that lets users connect their medical records and ask personalized health questions. Just days before that, Amazon expanded access to its Health AI tool beyond its One Medical service. OpenAI and Anthropic are already in the mix, with health-focused capabilities built into ChatGPT and Claude.

This isn’t a one-off trend. It’s the beginning of AI becoming a frontline source of health advice.

This raises a critical question—should it be?

Why Millions Are Turning to AI for Health Advice

The demand is undeniable. Microsoft reports that users ask around 50 million health-related questions every day through Copilot. Health is now the most common topic on its mobile app.

This aligns with broader trends. Research shows that people increasingly turn to the internet—and now AI—for medical information due to long wait times for appointments, high healthcare costs, and limited access in rural or underserved areas.

According to the U.S. Health Resources & Services Administration, tens of millions of Americans live in areas with healthcare provider shortages, and in urban areas, the average wait time in emergency rooms and clinics like Patient First across the U.S., is 2 hours and 42 minutes. In Virginia, that wait time is 2 hours and 46 minutes. So, as a Pew Research Center study found, a majority of adults have searched online for health information, often before seeing a doctor, because AI makes that process faster, more conversational, and available 24/7.

There’s also a psychological factor at play. People may feel more comfortable asking sensitive questions to a nonjudgmental chatbot, and many believe they may receive better treatment if they feel well-informed about the possible causes of a pain or illness when seeing their physician.

The Promise: Faster Help, Less Strain on the System

In theory, AI health assistants could improve outcomes while easing pressure on healthcare systems.

One of the most promising use cases is triage, helping people decide whether they actually need medical care.

If it works well, AI could:

Encourage urgent cases to seek care sooner and improve outcomes
Reduce unnecessary ER visits
Help people manage mild conditions at home

Given that U.S. emergency departments see over 130 million visits annually, many of which are non-urgent (and as noted above come with a 2+ hour wait time before seeing a nurse or doctor), even small improvements could have a major impact.

The Reality: AI Still Gets It Wrong

Recent research from Mount Sinai found that AI tools like ChatGPT can sometimes recommend excessive care for minor issues and miss signs of serious emergencies.

Other studies echo similar concerns. A 2023 analysis published in JAMA Internal Medicine found that AI-generated medical responses can vary widely in quality and may include incomplete or misleading information. So, even when AI is technically capable, real-world use introduces new risks.

The Hidden Problem: Users Don’t Ask the Right Questions

One of the most overlooked issues isn’t the AI—it’s the human using it. Research from Oxford suggests that even if an AI system can correctly identify a condition, non-expert users often fail to get accurate answers because they don't provide the right details and then misinterpret the AI responses. Not knowing what follow-up questions to ask is another reason AI users get incomplete and incorrect instructions.

Disclaimers Won’t Stop Real-World Use

Every major AI health tool includes some version of this warning:

“This is not intended for diagnosis or treatment.”

In practice, however, that doesn’t change behavior. People will inevitably use these tools to self-diagnose and decide whether to seek care or explore treatment options. The concern is whether they will take actions that will harm their health without consulting with a physician first.

Who Is Checking the AI?

Tech companies say they rigorously test their systems. OpenAI, for example, created HealthBench, a benchmark designed to evaluate how well AI responds to medical conversations. There are also broader frameworks like Stanford’s MedHELM, which compares models across medical tasks.

But there’s a key issue: Most of these evaluations are created by the same companies building the tools.

raising concerns about bias, blind spots, and a lack of real-world testing. Independent validation, however, is expensive and therefore limited.

A Better Model Exists (But It’s Slower)

Google is addressing these concerns with a medical AI system called AMIE, which is being tested in a more realistic setting.

In the AMIE test environment, patients speak with the AI before seeing a doctor, and preliminary results have been promising. Participating physicians say the diagnoses are comparable to what patients would hear in an office setting, and they have not flagged any safety concerns to date.

But Google won't be releasing the app for a while. The company says more research is needed—especially around fairness, safety, and real-world performance.

This highlights a growing tension and conflict in health AI. The pressure on developers is to move fast and release an app, rather than move carefully and validate its efficacy to ensure safe use. Most companies are choosing speed, creating an unsafe environment.

The Core Dilemma: Imperfect AI vs. Limited Access

No one expects AI to be perfect. Doctors aren’t perfect either. So the real question is this:

Is an imperfect AI better than no access to care at all?

For many people—especially those in underserved areas—the answer might be yes. Without strong evidence, though, we simply don’t know if current tools can improve outcomes or introduce new risks.

Are We Moving Too Fast?

AI health tools clearly have potential, can expand access, reduce system strain, and empower patients. Three major gaps are left to fill:

Increasing access to independent evaluation
Improving real-world user behavior
Proving safety in high-risk scenarios

Until these gaps are addressed, these tools sit in an uncomfortable middle ground: too useful to ignore—but not fully safe to trust completely. Still, having AI as a resource provides a level of comfort to many, and in minor medical situations, can help provide immediate solutions that relieve discomfort and anxiety. Further research and user education will help make Health AI a useful and often necessary tool to improve recovery and overall health.

Sources: MIT Technology Review, JAMA, Oxford University, Health Bench, U.S. Health Resources & Services Administration, Reva Air Ambulance, Pew Research Center