Opinion | Are AI Chatbots in Healthcare Ethical?

Within a week of its Nov. 30, 2022 release by OpenAI, ChatGPT was the most widely used and influential artificial intelligence (AI) chatbot in history with over a million registered users. Like other chatbots built on large language models, ChatGPT is capable of accepting natural language text inputs and producing novel text responses based on probabilistic analyses of enormous bodies or corpora of pre-existing text. ChatGPT has been praised for producing particularly articulate and detailed text in many domains and formats, including not only casual conversation, but also expository essays, fiction, song, poetry, and computer programming languages. ChatGPT has displayed enough domain knowledge to narrowly miss passing a certifying exam for accountants, to earn C+ grades on law school exams and B- grades on business school exams, and to pass parts of the U.S. Medical Licensing Exams. It has been listed as a co-author on at least four scientific publications.

At the same time, like other large language model chatbots, ChatGPT regularly makes misleading or flagrantly false statements with great confidence (sometimes referred to as “AI hallucinations”). Despite significant improvements over earlier models, it has at times shown evidence of algorithmic racial, gender, and religious bias. Additionally, data entered into ChatGPT is explicitly stored by OpenAI and used in training, threatening user privacy. In my experience, I’ve asked ChatGPT to evaluate hypothetical clinical cases and found that it can generate reasonable but inexpert differential diagnoses, diagnostic workups, and treatment plans. Its responses are comparable to those of a well-read and overly confident medical student with poor recognition of important clinical details.

This suddenly widespread use of large language model chatbots has brought new urgency to questions of artificial intelligence ethics in education, law, cybersecurity, journalism, politics — and, of course, healthcare.

As a case study on ethics, let’s examine the results of a pilot program from the free peer-to-peer therapy platform Koko. The program used the same GPT-3 large language model that powers ChatGPT to generate therapeutic comments for users experiencing psychological distress. Users on the platform who wished to send supportive comments to other users had the option of sending AI-generated comments rather than formulating their own messages. Koko’s co-founder Rob Morris reported: “Messages composed by AI (and supervised by humans) were rated significantly higher than those written by humans on their own,” and “Response times went down 50%, to well under a minute.” However, the experiment was quickly discontinued because “once people learned the messages were co-created by a machine, it didn’t work.” Koko has made ambiguous and conflicting statements about whether users understood that they were receiving AI-generated therapeutic messages but has consistently reported that there was no formal informed consent process or review by an independent institutional review board.

ChatGPT and Koko’s therapeutic messages raise an urgent question for clinicians and clinical researchers: Can large language models be used in standard medical care or should they be restricted to clinical research settings?

In terms of the benefits, ChatGPT and its large language model cousins might offer guidance to clinicians and even participate directly in some forms of healthcare screening and psychotherapeutic treatment, potentially increasing access to specialist expertise, reducing error rates, lowering costs, and improving outcomes for patients. On the other hand, they entail currently unknown and potentially large risks of false information and algorithmic bias. Depending on their configuration, they can also be enormously invasive to their users’ privacy. These risks may be especially harmful to vulnerable individuals with medical or psychiatric illness.

As researchers and clinicians begin to explore the potential use of large language model artificial intelligence in healthcare, applying principals of clinical research will be key. As most readers will know, clinical research is work with human participants that is intended primarily to develop generalizable knowledge about health, disease, or its treatment. Determining whether and how artificial intelligence chatbots can safely and effectively participate in clinical care would prima facie appear to fit perfectly within this category of clinical research. Unlike standard medical care, clinical research can involve deviations from the standard of care and additional risks to participants that are not necessary for their treatment but are vital for generating new generalizable knowledge about their illness or treatments. Because of this flexibility, clinical research is subject to additional ethical (and — for federally funded research — legal) requirements that do not apply to standard medical care but are necessary to protect research participants from exploitation. In addition to informed consent, clinical research is subject to independent review by knowledgeable individuals not affiliated with the research effort — usually an institutional review board. Both clinical researchers and independent reviewers are responsible for ensuring the proposed research has a favorable risk-benefit ratio, with potential benefits for society and participants that outweigh the risks to participants, and minimization of risks to participants wherever possible. These informed consent and independent review processes — while imperfect — are enormously important to protect the safety of vulnerable patient populations.

There is another newer and evolving category of clinical work known as quality improvement or quality assurance, which uses data-driven methods to improve healthcare delivery. Some tests of artificial intelligence chatbots in clinical care might be considered quality improvement. Should these projects be subjected to informed consent and independent review? The NIH lays out a number of criteria for determining whether such efforts should be subjected to the added protections of clinical research. Among these, two key questions are whether techniques deviate from standard practice, and whether the test increases the risk to participants. For now, it is clear that use of large language model chatbots is both a deviation from standard practice and introduces novel uncertain risks to participants. It is possible that in the near future, as AI hallucinations and algorithmic bias are reduced and as AI chatbots are more widely adopted, that their use may no longer require the protections of clinical research. At present, informed consent and institutional review remain critical to the safe and ethical use of large language model chatbots in clinical practice.

Benjamin Tolchin, MD, MS, is a neurologist at Yale School of Medicine and the Yale Comprehensive Epilepsy Center, and the inaugural director of the Yale New Haven Health System Center for Clinical Ethics.

Stay connected with us on social media platform for instant update click here to join our  Twitter, & Facebook

We are now on Telegram. Click here to join our channel (@TechiUpdate) and stay updated with the latest Technology headlines.

For all the latest Health News Click Here 

 For the latest news and updates, follow us on Google News

Read original article here

Denial of responsibility! TechiLive.in is an automatic aggregator around the global media. All the content are available free on Internet. We have just arranged it in one platform for educational purpose only. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials on our website, please contact us by email – [email protected]. The content will be deleted within 24 hours.