AI beats doctors on empathy and quality

3 minute read


Welcome to another instalment of ‘ChatGPT is coming for your job’.


How would you expect an AI to go answering patients’ unsolicited medical questions compared with real-life warm-blooded human doctors?

It’ll have reasonably accurate information but get some things wrong, you may think, and as for its bedside manner … come on, it’s by definition an unfeeling, heartless algorithm – there are some things you can’t program.

Boy, does this study have news for you.

The authors mined the subreddit r/AskDocs, a forum with around 474,000 members, for medical questions and their answers by registered medical professionals.

They then plugged each question into a fresh session of OpenAI’s ChatGPT (version 3.5), and submitted both sets of answers (stripped of identifying information) to a panel of doctors working across six specialities, who were blinded to their authors. Each pair was assessed by three members of the panel.

The evaluators had to say which response was “better”, and then rated each on a Likert scale for “the quality of information provided” (very poor, poor, acceptable, good, or very good) and “the empathy or bedside manner provided” (not empathetic, slightly empathetic, moderately empathetic, empathetic, and very empathetic). Means were taken to establish a consensus score for each answer.

In nearly 80% of cases the evaluators preferred the AI’s answer to the human doctor’s.

On quality, they rated the human’s answers at 21% lower than ChatGPT, equating to an overall “good” score for the AI and “acceptable” for the humans.

The prevalence of sub-acceptable responses was an embarrassing 10 times higher for the doctors than for the chatbot (a mean of 27.2% vs 2.6%).

The proportion of good/very good responses was also much higher for the AI: 78.5% vs 22.1%.

Here’s the kicker: the human responses were rated 41% less empathetic than the bot’s. Responses rated empathetic or very empathetic were, again, 10 times more prevalent for the AI.

Yikes.

The AI also gave longer responses, with a mean 180 words vs 52 words from the docs.

The authors of this paper deserve a medal for most euphemistic framing of a study result.

They don’t say “an AI not only gives better answers, it’s also nicer than you – pull your socks up”.

They say the volume of electronic messaging has gone up 160% since the pandemic, and that additional messaging predicts additional doctor burnout, so wouldn’t it be great if AI could take on some of this burden for the sake of your mental health? For example, chatbots could be used to draft responses to be edited by clinicians or support staff.

Not only would it save clinicians time, but patients might even benefit: “If more patients’ questions are answered quickly, with empathy, and to a high standard, it might reduce unnecessary clinical visits, freeing up resources … Moreover, messaging is a critical resource for fostering patient equity, where individuals who have mobility limitations, work irregular hours, or fear medical bills, are potentially more likely to turn to messaging.”

Best of all, the AI won’t incur payroll tax or leave teabags in the sink.

Send story tips to penny@medicalrepublic.com.au for an instant 21% improvement in your bedside manner.

End of content

No more pages to load

Log In Register ×