New AI Shines in Healthcare, but Clinicians Still King

Over the past few years, excitement surrounding generative AI has reached a fever pitch. The rapid progress of large language models (LLMs) like OpenAI’s GPT-4 with tasks like being able to pass the Bar exam led prognosticators to discuss all the jobs that could be replaced with this emerging technology.

Healthcare is no different in anticipating the potential of AI, already looking to a future where it can help with diagnosis, treatment, and prescribing decisions. GPT-4 has been the subject of quite a bit of research, but it isn’t the only AI out there with visions of changing the future. Google’s Gemini is a large multimodal model, which means it’s equipped to handle more than just text, including data inputs like images or video. Google also specializes Gemini, so in the healthcare space they’re using a model called Med-Gemini, which can provide more relevant results than a model designed to do anything and everything.

Results from Med-Gemini have been impressive. It outperformed GPT-4’s model across an assortment of medical benchmarks, including reaching 91.1% accuracy on content representative of the US Medical License Exam. This is a jump of four and a half percent compared to Google’s healthcare-focused LLM, Med-PaLM. 

Before Med-Gemini, Med-PaLM 2 was posting impressive results. (Source: https://sites.research.google/med-palm/)

The way Google has been able to achieve this progress is by tapping into uncertainty. Last week, we noted that one of the drawbacks to chatbots is their unwavering confidence. Google’s model has been trained to recognize when there is ambiguity in its result, and to use internet searches to gain further context and a more accurate answer. Even this process can be refined to increase accuracy as the model currently searches the entirety of the web, but it could be restricted to only use only medical journals or other reliable resources.

Google researchers warned that the technology still contains risks and that they’re trying to evaluate their model as a research tool before thinking of clinical applications. Studies are being done on the efficacy of AI in clinical work as well though. Researchers from Mass General Brigham recently assessed the usefulness of GPT-4 in answering hypothetical questions for cancer patients, comparing the AI results to radiation oncologists responding manually.

In reviewing the collective responses, physicians had difficulty identifying which responses were drafted by a human versus AI, with a full third of the AI responses being identified as human-written. Correctly identifying whether or not a response was written by AI is nice, but the crucial metrics of research like this are in patient care.

Of the responses generated by GPT-4, more than half (58%) were judged acceptable to send to a patient with no edits and 82% were considered “safe.” The unsafe bucket includes 7% of the total responses that researchers felt could pose a risk to patient safety. They did note that of the patient safety examples, the model’s biggest issue wasn’t a misdiagnosis, but a lack of urgency in telling patients to see a medical professional. Researchers believe this particular issue may be the result of the model being too polite toward patients, prioritizing the tone it believes the patient wants to hear over the importance of the issue.

At this point it’s safe to say that we’re not ready to ship clinicians out the door and replace them with computers on wheels, but it’s clear that AI models can produce at least some patient-ready materials, significantly more so with a clinician on hand to edit the responses. In a healthcare environment where burnout and staffing are the lightning and thunder of our storm, we need to empower clinicians to be more effective at their jobs.

With artificial intelligence, clinicians shouldn’t be replaced, but can become editors of patient notes rather than writers. We know that the move to incorporating virtual care as a part of regular business has created more work for clinicians, so a tool that provides a head start can not only mitigate the increase, but also get back some of the after hours time that was necessary even before the pandemic.

Extract takes a similar philosophy to healthcare institutions’ incoming documents. For simple tasks, like identifying the type of document you have or the patient it belongs to, our software can take care of that, no sweat. For more complicated documents that might have dozens of lab results, we can still match the patient, order, encounter, and find all the discrete values, but your staff can review things to make any small edits needed.

Automation software is a fantastically powerful tool that’s progressing rapidly, but it’s not ready to take over healthcare jobs. What it’s ready to do is to empower existing staff to work at multiples of their current productivity, allowing for more attention to detail and less burnout. If you’d like to see how we do this at Extract, please reach out and we’ll schedule a demonstration of our software at your convenience.


About the Author: Chris Mack

Chris is a Marketing Manager at Extract with experience in product development, data analysis, and both traditional and digital marketing. Chris received his bachelor’s degree in English from Bucknell University and has an MBA from the University of Notre Dame. A passionate marketer, Chris strives to make complex ideas more accessible to those around him in a compelling way.