With the arrival of Large Language Models (LLM) onto the stage of popular attention in the last two years, questions about how artificial intelligence (AI) will shape our future have rightly grown in popularity and importance. Perhaps no single LLM has had more of an impact on this discourse than ChatGPT, a chatbot developed by OpenAI and first true look we have gotten into the capabilities of mature models.

While OpenAI themselves are quick to point out that ChatGPT is trained on internet data and can produce incorrect results (it’s in the top five frequently asked questions in their "What is ChatGPT?" article), the answers the model gives are convincing enough that you would be forgiven for taking the model at its word. This is arguably not an issue in the case of most queries, but several categories of questions are of high enough risk that no amount of error can be accepted. Take, for example, healthcare questions.

Being able to ask the internet about your ailments is no longer novel. Getting semi-accurate results is not even uncommon. But being able to converse with a seemingly intelligent entity about your health used to be limited to yearly checkups and virtual doctor appointments; maybe the occasional first date with a medical student. Now the free version of several LLMs allow you to explore the model’s perception of human wellness in-depth without the aid of a qualified practitioner to fact check the conclusions reached.

Thankfully, this potential pitfall has not gone unnoticed by the research community. A recent meta-study found that accuracy concerns were raised in a third of the sources analyzed, placing the issue second in popularity only to concerns about ethics and bias inherent in models trained against internet data. There are some promising signs of integrating LLMs into healthcare research such as the nearly 77% accuracy ChatGPT showed in diagnosing standardized clinical vignettes in one recent study on AI-assisted clinical decision making, but healthcare professionals, researchers, and developers alike continue to wrestle with the technology and its shortcomings.

Progress is rapid, however. The publicly available, free version of ChatGPT is powered by a tuned version of GPT-3.5 which was released nearly a year ago and has been succeeded by GPT-4 nearly half a year ago. The capability difference between the two cannot be overstated but may be more easily understood by observing a similar jump in the abilities of another AI: Midjourney.

Predating the initial release of ChatGPT by nine months, the text-to-photo model allowed a visceral exploration of generative art. Have you ever wondered what it would have looked like if Dali painted the Mona Lisa? If Barrack Obama was the lead singer of 90’s boy band NSYNC, curly blonde hair included? If Gandalf and the Hobbits had walked through the moon landing instead of the Mines of Moria?

If you could imagine it, Midjourney could show it to you. Or at least it could try. Often, there would be a glaring issue with the photos generated from your prompts; a figure with three arms here, an unintelligible blob of text there. There is a name for this phenomenon: hallucination. AI’s oft-memed inability to generate realistic looking hands has only recently begun to be rectified, though it certainly is being rectified.

Once LLMs can provide more correct answers more commonly, they may be trustworthy additions to medical workflows, speeding the diagnosis and treatment of ailments in hospital and lab settings. But that will not be their final hurdle.

Midjourney is also the poster child of what might be the last hurdle for adoption of LLMs into the workplace: liability. A fierce debate currently surrounds the copyright system and its handling of AI generated art. Does the person who wrote the prompt that the AI used to generate an image own the image? Does the AI model? Does the company who created or trained the AI? Does the person who created the image in the training set which is most like the output image? Who is to blame if the generated image causes harm?

Who is to blame if an LLM misdiagnoses an illness? If the prescribed treatment fails to heal the underlying cause? If the treatment ends up costing the life of the patient?

The ethics of AI generated content are already being studied. This issue is not impossible to overcome and the benefits of putting in the effort to successfully integrate AI models and LLMs specifically to healthcare workflows are promising. Models might improve access and discoverability of information about niche ailments, might speed diagnosis of complex illnesses, and might be used to help alleviate the current shortage of healthcare professionals.

Are you considering adding ChatGPT or another LLM to your workflow? How do you see it benefitting you and what are your reservations?

About the Author: Dakota Methvin

Dakota Methvin is a Senior Software Engineer at Extract Systems. He received his bachelor’s degree in software engineering from the University of Wisconsin-Platteville and has focused on solution architecture and integrations.

The Growth and Role of Large Language Models in Healthcare

Empowering The HIM Department

Univeristy of Kansas Case study

Extract blogs

Extract Platforms

Site map