The effectiveness of artificial intelligence conversational agents in healthcare: a systematic review (Preprint)
Milne-Ives M., de Cock C., Lim E., Shehadeh MH., de Pennington N., Mole G., Meinert E.
<sec> <title>BACKGROUND</title> <p>High demand on healthcare services and the growing capability of artificial intelligence has led to the development of conversational agents designed to support a variety of health-related activities - including behaviour change, treatment support, health monitoring, training, triage, and screening support. Automation of these tasks could free clinicians to focus on more complex work and increase accessibility to healthcare services for the general public. An overarching assessment of the acceptability, usability, and effectiveness of these agents in healthcare is needed to collate the evidence so that future development can target areas for improvement and potential for sustainable adoption.</p> </sec> <sec> <title>OBJECTIVE</title> <p>This systematic review aimed to assess the effectiveness and usability of conversational agents in healthcare and identify the elements that users like and dislike, to inform future research and development of these agents.</p> </sec> <sec> <title>METHODS</title> <p>PubMed, Medline (Ovid), EMBASE, CINAHL, Web of Science, and ACM Digital Library were systematically searched for articles published since 2008 that evaluated unconstrained natural language processing conversational agents used in healthcare. Endnote (version X9; Clarivate Analytics) reference management software was used for initial screening, then full-text screening was conducted by one reviewer. Data was extracted and risk of bias was assessed by one reviewer and validated by another.</p> </sec> <sec> <title>RESULTS</title> <p>A total of 31 studies were selected and included a variety of conversational agents - 14 chatbots (two of which were voice chatbots), 6 embodied conversational agents, 3 each of interactive voice response calls, virtual patients, and speech recognition screening systems, as well as one contextual question answering agent and one voice recognition triage system. Overall, the evidence reported was mostly positive or mixed. Usability and satisfaction performed well (27/30 and 26/31) and positive or mixed effectiveness was found in three quarters of the studies (23/30), but there were several limitations of the agents highlighted in specific qualitative feedback.</p> </sec> <sec> <title>CONCLUSIONS</title> <p>The studies generally reported positive or mixed evidence for the effectiveness, usability, and satisfactoriness of the conversational agents investigated, but qualitative user perceptions were more mixed. The quality of many of the studies was limited, and improved study design and reporting is necessary to more accurately evaluate the usefulness of the agents in healthcare and identify key areas for improvement. Further research should also analyse the cost-effectiveness, privacy, and security of the agents.</p> </sec> <sec> <title>CLINICALTRIAL</title> <p>N/A</p> </sec> <sec> <title>INTERNATIONAL REGISTERED REPORT</title> <p>RR2-10.2196/16934</p> </sec>