How low can you go and still call it evidence?

6 minute read

We and our patients deserve a better quality of evidence for digital mental health interventions.

In this month’s MJA, I read an article encouraging year 7 students to adopt more healthy behaviours. This is a worthy (and difficult) goal.

The team conducted a cluster randomised trial called Health4Life, where they used an online tool focused on lifestyle factors like diet, exercise, sleep and other behaviours that influence future health.

Despite a stellar effort in recruiting 71 diverse schools and around 6000 participants, the team found that using an e-health program was not statistically more effective than usual school health education for influencing changes in any of the 12 outcomes.

Of course, this is not in itself surprising. In fact, I appreciate the team publishing a negative finding. However, what did surprise me was the conclusion.

“Future e‐health multiple health behaviour change intervention research should examine the timing and length of the intervention”, the authors wrote, “as well as increasing the number of engagement strategies (eg, goal setting) during the intervention”.

In other words, it didn’t work, but maybe if we nag the students more, we can make it work.

This got me thinking.

Why is it that digital strategies which are supposed to be therapeutic are willing to accept a much lower level of evidence to ground their conclusions? Why are these conclusions usually framed so positively, even when the data doesn’t really support them?

I’m not a Luddite, in fact, I’m usually an early adopter, and I recommend digital mental health interventions frequently for the patients I have who can access and read them. However, the science drives me nuts.

What constitutes ‘evidence’ in digital health trials?

There seems to be a pattern in digital mental health trials that is quite concerning, and it involves the quality of data. There is one paper, the only one that is quoted in the Australian Commission on Safety and Quality in Health Care (2020), National Safety and Quality Digital Mental Health Standards, that backs the claim that digital mental health is accessible, affordable and equitable.

The results from this trial are based on 0.5% of the cohort finishing the trial.

Despite the extraordinarily high drop-out rate the team was happy to conclude that “this model of service provision has considerable value as a complement to existing services, and is proving particularly important for improving access for people not using existing services.”

The other 99.5% may well be quite unwell, but we will never know. We do know adverse events are rarely recorded in trials like this, and the drop-out rates are often high.

Anyway, it got me thinking – how low a sample size does it take for a journal to consider the trial evidence for digital health outcomes? So I went hunting, and was quite surprised by the outcomes.

Some of the trials are particularly mind-bending.

Here’s one that tried to determine whether an online program could reduce smartphone distraction. Which makes me wonder whether diverting someone from their smartphone to an app on a smartphone is actually a helpful outcome.

Nevertheless, the team studied 143 university students and gave them an app with mindfulness exercises, self-monitoring and mood tracking for 10 days. Apparently, it made no difference to their habitual behaviours but might have reduced smartphone distraction and “promoted insight into online engagement”.

An app to discourage your dependence on an app.

For me, though, the winner of the “how low do you go” stakes is this trial.

The team was trying to study the use of a digital program for “screening, treatment, monitoring, and ongoing maintenance of mental health” for older adults. They used a tool that was “co-designed for this group” and enabled “access to a comprehensive multidimensional assessment, the results of which are provided in real time to enable consumers to make informed decisions about clinical and nonclinical care options independently or in collaboration with a health professional”.

All sounds quite appropriate thus far.

Until they recruited their sample – 19 participants, of whom 16 completed “at least part of the survey”. Apparently, they all reported good mental wellbeing, which might explain why “participants had difficulty identifying the relevance of the tool for their personal circumstances”.

There is no problem recruiting a sample without the condition you are studying, if you are simply looking at usability. However, the conclusion seems to be utterly unrelated to the results. The authors wrote:

These findings highlight the tremendous opportunity to engage older adults … and … support their mental health and wellbeing, either through direct-to-consumer approaches or as part of standard care. However, this study helps to establish and confirm that it is critical that the design and purpose of any HITs are relevant, appropriate, and personalised for older adult end users, accounting for differing demographic factors, interests, clinical profiles, and levels of need. As demonstrated in this study, the evaluation of HITs helps capture practical feedback on the design of HITs, allowing for iterative refinement before broader implementation, thus facilitating engagement and adoption.

The conclusion seems to boil down to a comment that perhaps the program would work better if it were used by the people it was designed for.

Nevertheless, by the time this paper, with others from the series on the same digital program, was formulated into a report to government, it included the statement that all research published in “prestigious academic journals” and presented at national and international conferences, was “conducted to a high standard of research excellence”.

How high can we go?

There are, of course, exemplary studies that should be used as a benchmark. It’s not about the numbers or the outcomes for me, it’s about the honesty.

I have always respected the team behind iBobbly, an app designed to prevent suicidality in Aboriginal and Torres Strait Islander youth. The deep engagement and commitment to genuine co-design is clear in the qualitative data, and it is this data that helps me, as a clinician, to decide when to prescribe or encourage this particular app.

If we had this sort of data for all the apps, I could make a measured and appropriate clinical decision on their use.


I need to know whether digital programs are safe, whether they can be accessed seamlessly, whether they improve a person’s sense of agency and confidence, and whether they are superior to what I already do.

As doctors, we are supposed to be good at science. Our journals are supposed to expect good science prior to publication.

It beggars belief that we are willing to accept the sort of trial that studies 16 people without the condition of interest who complete some of the intervention that they already consider irrelevant and then concludes that the intervention offers a “tremendous opportunity” to help real patients with their real concerns.

Surely our patients, policymakers and fellow clinicians deserve better?

Associate Professor Louise Stone is a working GP who researches the social foundations of medicine in the ANU Medical School. She tweets @GPswampwarrior.

Professor Stone will be presenting at our upcoming event Burning GP on 14 and 15 June at the Mantra on Salt Beach, Kingscliff NSW. Only a handful of tickets are still available – see program and tickets here.

End of content

No more pages to load

Log In Register ×