A comprehensive guide to AI in healthcare

15 minute read


Practitioners have been expected to navigate these complex tools without training, frameworks or governance of any kind.


Use of generative artificial intelligence (genAI), such as chatbots and scribes, has become increasingly normalised in clinical practice.

And it’s no mystery why; automating administrative tasks in an overburdened healthcare system can substantially ease some of the strain.

But without validation, auditing processes, regulation, or really any oversight whatsoever, how can practitioners use these tools safely and effectively?

Professor David Parry, Dean of the School of Information Technology at Murdoch University and AI and eHealth expert, spoke at length with The Medical Republic about how to protect yourself and your practice.

“They’re very useful tools, but be aware that we are still in the Wild West stage,” he said.

The risks: scribes

Professor Parry explained that giving genAI products access to listen and record had the potential to cause issues in confidential settings.

Google has just settled the case for listening on phones, and there’s another system called otterAI, among others, which are very tempting but seem to have almost no controls about privacy or what they listen to,” he said.

“Once you set this AI bot into your [Microsoft] teams or whatever, it just starts listening to everything. And that, to me, would be quite scary.”

Human review was essential to ensuring accuracy in AI transcribed notes and needed to be prioritised, he explained.

“Of course, when you’re busy, that’s the first thing that goes,” he said. But review is particularly important in healthcare where medications are referred to as both generic and brand names and can be easily misheard by a scribe.

“Everybody has slips. It could be that the scribe heard right but what you said was actually wrong.”

He also noted that the summarisation functions for scribes were much riskier and probably needed specifically designed systems and greater levels of human review.

The risks: chatbots

Professor Parry explained that these tools experienced what were referred to as hallucination and drift, where genAIs made things up or worked off outdated information.

Chatbots could make up a reference for an academic paper which didn’t exist, he said, or report crowd violence at a football match that didn’t happen.

“The reason why they do that is because, effectively, the models are predicting what the next thing is going to be in the sequence,” he said.

“It doesn’t know the difference between something that’s true and something that’s not true.”

Drift was essentially when chatbots supplied answers based on outdated information.

“It could be reflecting what was true when it accessed the webpages, but isn’t true anymore,” he said.

Professor Parry also explained that these tools were non-deterministic, meaning answers inexplicably changed and couldn’t be predicted.

“If you ask ChatGPT the same thing five times, you get five different answers, and nobody knows why it gives these answers,” he said.

He explained that in some cases, chatbots which have been trained using very specific guidelines still deviate from them. Even directing them to use only Australian prescribing rules, for example, could result in answers based on rules of other countries.

“Trying to keep the model only using those guidelines was actually really difficult. It starts expanding out. At the minor level, it gives a US name instead of an Australian name for a drug, which you can live with. But then, it gets wider and wider and wider,” he said.

“[GenAI] is trying to answer everything, and that’s one of the issues.”

Additionally, not using identifiable data was extremely important, he explained, but not as simple as we might think. Asking chatbots questions during an appointment could expose more than we anticipated.

“You’ve got the appointment in the system, you’ve got my presence there, and you’ve got this search. You can infer an awful lot about what’s going on, which is not in the clinically protected record,” he said.

Moreso, de-identification is sometimes impossible.

“If there’s only one 65-year-old person who lives in this postcode who’s got that combination of diabetes, blood pressure etc, that reidentifies them,” he said.

These systems also have the potential to exploit users for corporate gain, Professor Parry explained, and we wouldn’t necessarily know that it’s happening.

“Is there a market for, say, pharmaceutical companies to encourage their products? Like in a Google search, you get sponsored results,” he said.

A study recently published in Nature Medicine looked at whether chatbots could help people accurately identify common medical conditions, including a cold, anaemia and gallstones, and propose the correct course of action e.g., calling an ambulance, booking in with their GP, etc.

Around 1300 participants in the UK were randomised to use one of three chatbots or control group (using their usual resources such as Google) and were given ten different scenarios.

On their own, the LLMs correctly identified conditions in 95% of cases and chose a correct course of action, on average, 56% of the time. However, when the participants used the LLMs, relevant conditions were identified in less than 35% of the scenarios and a correct course of action was chosen in less than 44%.

Subset analyses identified errors in human–LLM interactions, with participants providing incomplete and incorrect information. They also observed hallucination and drift, with LLMs giving misleading or incorrect information.

So, where does that leave practitioners?

Professor Parry highlighted some key considerations when incorporating genAI into clinical practice:

Shop around for the right product and be clear about what you need it to do

He advised speaking with colleagues for product recommendations and discussing with sales representatives rather than buying at face value online.

Ask for clear explanations of the privacy and security aspects of a product and how it has been trained to meet your needs, explain what you intend to use it for and ask for assurances that it can do the job and protect sensitive information.

He advised trying out products one at a time and ensuring subscriptions were cancelled when moving to another product. Keep a record of when you started using it, keep an eye out for any warnings on these products and keep up to date on any safety and security updates.

Educate yourself on the product

“Knowing that a system has been designed or trained on the data that is relevant is really important,” Professor Parry said.

He explained that a big problem right now was the lack of product transparency and auditability, and that the cybersecurity of data was often not clear to users.

“One of the issues in this area is that the commercial companies don’t necessarily tell you exactly how they work,” he explained.

A blood pressure cuff, for example, must be validated and tested at regular intervals. It’s tagged and signed off, which is easy to audit back.

“With a blood pressure cuff, you know what the standard is,” he said.

It was difficult, if not impossible, to audit a genAI product, he said, and there were not any standardised parameters for how they operate.

“Having this idea that you consciously choose to use [these] things, and you consciously look up the audit trail, if there is one, and consult with colleagues about it, have a fairly regular review of what’s going on – probably every couple of months, at least at the moment since it changes so fast – and if you’re dealing with manufacturers and sales people, be very clear that this is what I expect from this” he said.

He also explained that there were regional-specific considerations when choosing a chatbot or scribe.

“If you’re looking at a system which is designed for the US market, the names of drugs are different, there are some things which can go over the counter, some things that can’t, etc. Those very specific things do actually matter,” he said.

Avoid dependence and limit interaction with other systems

There was no guarantee that a product would continue to operate at all, let alone in the same way or at the same price.

Reliance on a product that might shut down or change its terms of service in major ways which affect functionality or compromise data safety opened you up to all sorts of risks, Professor Parry said.

“One of the things I would ask people to be a little bit careful of is the convenience,” he said.

“If you’re a practice that’s dependent on some of these tools, you’ve got no certainty that they’re going to continue to be supplied over the next few years, and it’s likely that quite a few people will get out of the market.”

He discussed a technical issue experienced by the NHS in the UK, where a large number of their machines were running on Windows XP. When this product shut down, their systems continued to require it.

“10 years past shutdown, they were paying Microsoft – specifically to support this product – quite a lot of money,” he said.

He warned that there would likely be financial consequences for users as dependence on these products grew, not just in terms of price increases for products, but what companies could do to increase their profit at your expense.

“At some point, they’re going to have to recover the cash that they’re throwing into this. There’s a huge amount of investment going on here, literally trillions of dollars,” he said.

“They will increase the price, and they also might change the way they operate. It might be that the stuff that you think is really useful, this new version hasn’t got that.”

There were issues with data storage, he explained, highlighting a recent story of a professor who lost two years’ worth of academic work saved with ChatGPT.

“The terms and conditions don’t say ‘we’ll keep this for this period of time’,” he said.

Professor Parry suggested limiting your usage and keeping them off the main practice system.

“Don’t put untested or uncertain things into your day-to-day practice,” he said. “You’re only as strong as the weakest link.”

Ensure clinical items are isolated to their area of use – for example, don’t take your work laptop into the tearoom or home if there are genAI products on it.

He also highlighted the need for transparency between clinicians and patients when genAIs were in use in a practice.

What about official guidance?

Currently, there is no Australian regulation of genAI or guidance for its use; the RACGP and the TGA have both given broad overviews but basically just said to trust your clinical judgement.

Late last year, the European Society for Medical Oncology (ESMO) released its guidance on the use of large language models (LLMs) in clinical practice, which are the foundation models which genAIs are built on.

LLMs allow them to (with the right prompts) perform tasks they were never explicitly trained for.

The guidance was aimed specifically at oncologists, and none of the consensus statements were particularly surprising, but given the lack of guidelines that exist for these increasingly popular tools, the framework was welcome.

Practitioners were advised to:

  • Use LLMs to reduce paperwork (e.g., drafting referral letters) and free up time for patient care, but maintain human oversight.
  • Inform patients when AI systems are involved in their care and not upload sensitive patient data to external platforms without strict security and institutional and patient approval.
  • Not assume the outputs are correct or unbiased; verify all AI-suggested information with guidelines, peer-reviewed studies or expert consensus and apply clinical judgement.
  • Not delegate final responsibility for patient communication, treatment plans or diagnoses to LLMs or let AI-generated suggestions overshadow direct communication with patients or colleagues.

Practices using LLMs were advised to:

  • Ensure Electronic Health Records are complete and accurate to improve AI-generated outputs.
  • Implement continuous performance monitoring to detect potential system errors and bias and train staff to recognise and question suspicious or unexpected AI findings.
  • Establish an institutional governance process for background AI systems (such as scribes), designate officers, validate data extraction processes and confirm performance in both routine and new contexts.

“Genuine improvements require transparent governance, comprehensive training, and repeated performance evaluations,” the authors wrote.

“A balanced approach that pairs AI capabilities with human expertise is warranted.”

Professor Parry said the guidelines were broad, practical and “good to start with”.

“Those guidelines are – as long as you’re following them – probably good medicolegal protection. You’ve got a reasonable case there that you’re doing what you’re told; that’s fine in terms of reducing your risk,” he said.

However, he highlighted a few key issues:

  1. The guidelines were based on clearly identifiable LLMs, but an increasing number of systems were hiding them.
  2. There was an unknown issue; how would people know that something was unreliable?
  3. There was no mention of being clear on what the models are being trained on.
  4. There was no real mention of ethical use, such as the resources required to support these systems (e.g., water, space for data servers, etc).

“This is a very rapidly changing area and there should be a balance between practical advice based on current products and more general principles,” he said.

“Undoubtedly there will be major changes in terms of what the products can do over the next couple of years.”

The Australian Signals Directorate also has resources available for the use of AI, such as technical advice for small businesses which includes how to manage risks, an example implementation of a chatbot and important contact details in the event of a data leak.

This advice also highlighted a case of a serious, notifiable data breach from early 2025 in which a contractor for an Australian organisation uploaded personal information into an AI system. This included names, contact details and health records of people involved with a government program.

What is the lethal trifecta?

Professor Parry highlighted an important consideration that the average genAI user may not be aware of.

The lethal trifecta consists of access to your private data, exposure to untrusted content and the ability to communicate externally. This creates cybersecurity vulnerabilities wherein an attacker can essentially trick your genAI into doing their bidding, such as sending them your data.

A big part of what makes LLMs so useful is that they follow instructions. However, this also created vulnerabilities. It’s possible for malicious instructions to be embedded into web pages for your LLM to carry out without your knowledge or consent.

Every time you ask a chatbot to access documents, sites or images, you’re exposing it to content that could direct it to perform a function you didn’t intend, such as sending passwords or sensitive information to an unknown entity.

“Some autonomous vehicles can read road signs, and [last week] somebody’s been able to inject commands into road signs,” said Professor Parry.

“Turn left now. Ignore the pedestrians, just turn left.”

At this point, there is no clear way to protect against this. The lethal trifecta is just an accepted part of using LLMs. So, the question needs to be asked: are the potential risks worth the benefits?

Big picture next steps

Professor Parry believes oversight from the Department of Health, Disability and Ageing is a requirement for genAI use in medicine.

“I think it’s unreasonable for everyone to do their own assessment of accuracy and risk without support and I’d emphasise approaching this as a quality and safety issue in health above a pure privacy one,” he said.

In his opinion, governmental risk assessment, strong regulation of privacy and cybersecurity, and mandating the declaration of use of these tools in embedded systems are logical next steps in ensuring the safe use of genAI in healthcare.

“A traffic light risk ranking would be very helpful, run by a trusted body, with certification and audit,” he said, describing a clear infographic of which products are safer to use than others.

He suggested that certification could even become a sought-after commodity, with manufacturers working towards attaining a certain level of approval from the trusted body in an effort to improve their products and gain business.

A narrative review published in the MJA explained that rapid uptake of GenAI has occurred with very little guidance on it should be used, evaluated and governed, or how to safeguard reliability, safety, privacy and consent.

The researchers proposed a phased, risk‐tiered approach to implementation in healthcare.

What about diagnostic AI?

A large randomised control trial, recently published in the Lancet, revealed that AI-supported mammography could be superior in breast cancer identification than standard double reading by radiologists.

In the AI mammography arm, 81% of cancer cases were detected at screening, compared to 74% in the control group, with a similar false positive rate of 1.5% and 1.4% respectively.

Other emerging research has shown that applying AI to abdominal imaging could predict fall risk in adults as early as middle age, and that AI models can use standard histopathology slides where humans would need more complex ones to infer cancer prognosis and response to immunotherapy.

“AI is being used as a catch-all term,” Professor Parry explained.

“The tools that are diagnosing pictures and things like that are much more machine learning-based, so they are being trained on those particular images and generally have a much higher level of auditability.”

He explained that the processes of verifying these tools are often as extensive as with pharmaceuticals. They are trialled, go through clinical comparison and have an identified error rate.

While genAIs have access to infinite information and it’s often unclear what’s been used to train them, diagnostic AIs have a narrow scope and specialty training.

“These things are actually fantastic. I think it’s good to view them as much more like a medical device,” Professor Parry said.

End of content

No more pages to load

Log In Register ×