What if your AI scribe could see as well as hear?

7 minute read


It’s just at proof of concept stage, and privacy and hallucination are still of concern, but the benefits are there.


A proof of concept study shows vision plus audio makes an AI scribe much more accurate than audio alone.

Regulators and clinicians are still grappling with the myriad issues associated with the accuracy, privacy and data security issues associated with AI scribes in clinical settings – and those are only “listening” in on consultations.

Now researchers from Flinders University College of Medicine and Public Health – known as the artificial intelligence and clinical epidemiology lab – have introduced vision into the equation.

They have tested a scribe which uses both audio and vision, specifically in the scenario of taking a medication history, and compared its results to an audio-only AI scribe.

“Generative AI has been around for a few years now, but it’s only really a recent thing where we’ve been able to process videos and audio. So this is really a demonstration of, well, okay, we’ve got these new capabilities of AI, how can we actually apply it to a task?” lead author and PhD candidate Bradley Menz told The Medical Republic.

One of the biggest arguments for AI scribes was that they could reduce clinicians’ administrative burden, but checking for inaccuracies also took time, so improving accuracy for that reason was part of the motivation, he said.

Another was figuring out how to optimise practice in his field.

“I’ve taken hundreds, thousands of medication histories before. Is there a way we might be able to do it a little bit more efficiently? How can we optimise some of our workflows at the end of the day? That’s really the goal of it,” he told TMR.

“To the best of our understanding, this was really the first [proof of] concept demonstrating that in a health context. But I’m sure there are many other ways in which it could be explored. Hopefully, our demonstration really is the start of a cascading effect that can be tried in many disciplines.”

Mr Menz and his colleagues at the centre have a pharmacy background, and “a keen interest” in public health and how AI could improve patient outcomes.

“As a group, we like to have a balanced approach to AI. I actually think there’s going to be so many applications we can use these tools clinically, and I do think for the most part, they’ll be for the greater good,” said Mr Menz.

“But by the same token, we need to be careful with errors. We need to be careful with privacy. There’s so many things, bias, safety and so forth.

The study found the audio-visual AI scribe was 17% more accurate than the audio-only AI scribe (98% accurate, compared to 81%) and there were 348 fewer omissions.

Out of the 2,160 data points, only 46 were scribed incorrectly. Ten were errors of omission, but 36 were errors of commission – in other words, hallucinations – including 11 errors in patient details, eight in medication name, eight in strength and form, two in dosing directions, two in indication, and five in clinical notes.

“They’re the errors that my colleagues and I, we’re more concerned about. It’s guesswork and… quite dangerous.

“And what we found in our study was we were actually unable to identify why the error happened,” said Mr Menz.

“So for instance, there was a case where it got the wrong medication name. In clinical practice that could be quite detrimental. And similarly, it got the wrong strength and form for a medication. So it was meant to be 145mg but it scribed down 160.”

His group was very focussed on figuring out how to reduce these commission errors, he said.

The paper, to be published in npj Digital Medicine, is the final chapter of Mr Menz’s PhD thesis which also looks at the safety and the opportunities of AI in health.

In the study, five pairs of clinical pharmacists participated in 110 structured, mock medication-history taking interviews, with the history-taker wearing Meta AI Ray-Ban glasses.

“We selected that because at the time, to the best of my knowledge, it was the only available smart glasses. And wearable technology is quite useful, but it’s not for everyone, and there’s potentially many other ways in which you can do this. There was no strong preference… it just seemed to suit the study quite well,” Mr Menz explained.

The recordings from the glasses were then analysed by an AI scribe, developed using Google’s Gemini AI. The scribe was trained on 10 of these recordings. The scenarios involved some medication packs, some discussion of medications not present in the room and various commonly faced scenarios.

“We really tried to make the cases kind of round in the sense that they captured many different disease conditions that you would see in regular practice, and matching that up with five to 10 different medications. That could be injections, tablets, creams, ointments and so forth. So we really tried to test the scribe. We didn’t take the path of least resistance. We wanted to see what it could do,” said Mr Menz.

“It was funny because the 10 pharmacists that did the interactions, some of them really leant into the idea. They really played difficult patients and also very coherent patients,

Where the audio-visual scribe really shone was in the areas of strength and form of medication, he said.

“Patients might be inclined to say, ‘Yeah, I have one of these tablets at night.’ But with the vision scribe, it was able to capture what that person was talking about. With that, the audio scribe was only 28% accurate, okay, but with the vision scribe, it was 97%.”

And more accurate information led to fewer medical errors, he said.

But privacy was going to be a big issue.

“Some people might be fine with voice, but they might not be fine with being video recorded. It’s just a different layer of privacy and that that matters to people. The whole implementation of that would need to engage consumer perspectives along that process as well,” he said.

Mr Menz said the study was conducted by storing the recorded, rather than sharing it, and then passing it through the scribe built on Google’s Gemini, but it was “a closed loop”.

“I think if we’re going to use these tools with real patients in the community, we need to be sure we know where that data is being shared.

“Now, in our case, I have confidence with ours. But I would be very cautious about some of these new technologies and where that information is shared. I think there’s certainly a lot of work that needs to be done there, not only from a tech standpoint, but also from an implementation standpoint.”

And does Australia have the existing frameworks to make it safe to use this technology?

“The whole AI space is moving so quickly. A lot of the scribes, they sit in this grey area. They’re being used and then regulation kind of sits somewhere in that picture,” Mr Menz said.

“Other AI tools might be very clinically focused applications, and they do go through a degree of regulation.

“This move towards generative AI in the likes of you know, the chat bots and the ability to process vision as well as sound, I think a lot of areas are playing catch up. And it’s really hard to say with the scribe we’ve used in our study, where that falls,” he said.

Trials in other scenarios would be necessary before unleashing it into general use, he said.

 “There’s a big gap between proof of concept, which is what we’ve done here, through to implementing something into routine practice.

“That can take years, to be honest.

“We have a core focus on trying to optimise AI and use it in clinical workflows, as well as auditing tools for safety. We’re looking more at those rates of error. How can we reduce commission errors? Because really, they’re the things that are going to reduce people’s trust in AI. We don’t want it to make errors. We want it to be as accurate as it can, and we want people to have confidence in using these things.”

End of content

No more pages to load

Log In Register ×