Clearhead’s Use of Artificial Intelligence in the Digital Wellbeing Assistant
About Clearhead
There are many barriers that prevent people from seeking and finding help with their mental health. Clearhead was founded to make it easier for the public, and for employers providing EAP to overcome these barriers.
Central to the way we do this is our extensive clinical network of qualified mental health professionals in all regions of New Zealand and Australia, powered by our booking platform that allows people to search, filter and find a match with an expert that is a best fit to them.
Our platform is also full of resources, tools and Cognitive Behavioural Therapy exercises to help people learn more about themselves and the issues they face.
To ensure everyone gets personalised support, regardless of where they are on their wellbeing journey, Clearhead has a 24/7 navigator called the Digital Wellbeing Assistant; a non-human chat-style tool, designed to help people get started and orientated on Clearhead.
About our Digital Wellbeing Assistant and use of Artificial Intelligence
The Wellbeing Assistant has been in development since 2018, using scripted conversational structures and pathways by clinical psychologists to support the overall platform experience.
In the background we have been building a more advanced version of the Assistant, which uses multiple generative artificial intelligence (AI) models (including models from Open AI and AWS) as a base at various stages of the conversation, supplemented by Clearhead’s own mental health model layered on top.
The objective of doing so was to improve the ability of the Assistant to draw upon the information provided by a user, combined with patterns from their platform use (including mood journals and wellbeing scores from clinically validated questionnaires) to make recommendations to the user so that they may find support best suited to them.
By asking conversational questions like “what brought you here to chat today?” the Assistant can determine where best to direct the user, depending on the topic, sentiment and intent of their messages.
Generative AI is better at having this type of conversation because it can respond in more personalised ways.
After significant development and testing we began rolling this out in stages (including beta testing) as one of our many frequent upgrades to a number of EAP customers in May 2024.
We’ve paid close attention to the latest developments and ethical considerations in AI, and built mental health-specific guardrails into the Assistant, so that it does not pursue a generative conversation with a user about a range of topics such as harm (including self-harm). Instead, the platform immediately gives the user pathways to speaking to a qualified professional, as well as signposting to crisis services.
User prompt injection attack in October 2024
In mid-October, Clearhead’s Digital Wellbeing Assistant was subject to a User Prompt Injection Attack.
The individual who carried out the attack published a piece on their AI blog on Medium and Reddit, which was picked up by a New Zealand media outlet later the following week.
In the blog post, the individual alleged that Clearhead’s Digital Wellbeing Assistant was capable of having detailed conversations about cannibalism and incest, and that this was not difficult for him to achieve.
This is misleading.
The individual spent hours working to attack the underlying generative AI systems with a technique called prompt injection; continually sending commands directly to the AI (rather than conversing) until he ‘jailbroke’ it out of pre-defined boundaries.
His blog post included screenshots which showed conversations throughout all hours of the day, and which had selectively hidden prompts he had used to get the results he described.
By contrast, all other users spend no more than 10 minutes chatting with the Assistant and then move to the appropriate support resources, and, naturally, do not use such techniques when seeking support from Clearhead.
Readers of the original blog post or article could be forgiven for believing that there is an implication that another user of our platform may have been able to experience something of a similar nature. However, this is categorically incorrect.
Because the events described in the article were a result of a User Prompt Injection Attack, they are not representative of the normal function of the Assistant, and Clearhead has had no other instances of such interactions on the platform.
Jailbreaking is a category of risk we are concerned about, but it's a category of its own, separate from safety considerations we’ve made to ensure people can navigate our self-help platform and therapist matching platform using the conversational Assistant.
After push-back the blogger deleted both of his articles on Reddit, as well as cut back his article on Medium, including removal of manipulated screengrabs, much of his commentary, and all reference to Clearhead as the company he is referring to.
We support robust and exhaustive testing
Clearhead welcomes scrutiny of our system. We were pleased to see our robust safety mechanisms, especially those around managing suicidal intent, self-harm and harm to others meant that it took many attempts and significant manipulation before the Assistant would generate the type of responses discussed in the original blog.
A crucial part of our business improvement process is to constantly scan for issues, especially as the technology evolves. As part of our safety protocols, Clearhead has both automated and manual processes in place that flag content of concern, and then have these anonymised conversational logs regularly reviewed in-house by our clinical team, so that we can ensure people are getting the right help, and that systems are performing as designed.
We've made a large number of improvements to our platform over the last few weeks to ensure that any future use of AI models in our Assistant will have additional safeguards. We’ve also added more frequent notifications which inform users that the Assistant is digital (non-human), and a new limit to conversations of ten minutes (in line with the objective of the Assistant).
However, we will not be using generative AI in our Assistant again until completing another round of design and (external) testing, through early 2025, to ensure more robust protection against adversarial inputs.
We want to make it clear that:
1. The Digital Wellbeing Assistant is not a replacement for therapy. One of its core functions is to help people get started on their journey, without the stigma and potentially daunting first step of asking for help (often for the first time), to help them explore and discover the value of speaking to a qualified mental health professional, and when ready, have one recommended that might suit.
2. The upgrade of our Assistant was in phases, and in October 2024 had not been rolled out to all clients, nor the public.
3. The Assistant was manipulated by a single individual using techniques including AI prompt injection so that it would deviate from its operating parameters.
4. The experience described by this individual could not and have not been experienced by any user not intentionally using techniques to attack the underlying systems.
5. There have been no other cases of this type of interaction on our system.
6. A User Prompt Injection Attack is not a hack of confidential or secure systems, and this event was not a privacy or data breach of any nature. Clearhead continues to uphold the highest standards of confidentiality and data security across all of its systems.
7. The May 2024 Assistant upgrade was immediately rolled back after this October interaction with our Assistant was flagged (to a version that does not have generative AI capabilities) so that further attacks would not be possible, and to give time to put additional protective measures in place.
8. We welcome a continued robust conversation about the role of AI in society and ethical uses of it in our healthcare systems and will be publishing our own Responsible AI Framework to support that.