Trust in Chatbots and Taxonomy of Breakdowns

Project Year: 2020

Background and Motivation

Mental health apps are increasingly prevalent, 12,000+ mental health apps available (Schueller et al., 2019)
Apps and content around mental health are largely unregulated (Neary and Schueller, 2018), few apps backed up by research/RCTs/efficacy data
Chatbots are increasingly utilized in this domain (e.g., Woebot, Wysa, Replika)
Breakdowns in communication decrease trust in chatbots (Ashktorab et al., 2019)

Research Questions:

What features lead to increased perceptions of trustworthiness and competence in mental health chatbots?
- Hypothesis: Greater presence of communication breakdowns will lead to decreased perceptions of trust and working alliance
- Working Alliance: client/therapist relationship and shared goals toward positive change

Methods used: Diary Study, Standardized Scales

Compared two mental health chatbot apps - Wysa and Woebot
Diary Study - Participants used each app and kept a log of: Duration of conversations, Communication breakdowns experienced, Empathetic expressions or chatbot self-referencing
Pre- and post- study surveys containing:
- Source Credibility Measure (trust and competence)
- Working Alliance Inventory
Dependent Variables: Measures of Trust, Competence, and Working Alliance
Independent Variable: Number of communication breakdowns/errors

Participants

3 participants, 10 days using each of the two apps
Participants chatted with Woebot daily for an average of 9.7 minutes and with Wysa daily for an average of 13.2 minutes

Results

P1: It [Woebot] seemed to genuinely care about how I was feeling, it was patient and respected my space (if I wanted it), and wanted me to feel better

P2: [Wysa] Repetitive not only in the activities, but also the CBT theories - every time, it says why the thought is unhelpful and how I can fix that by using positive thinking. There is no variation in its education

P3: Wysa is obviously unpolished. Woebot’s conversations felt fluid, restricted, and clear. Unlike that, Wysa’s conversations can be disrupted by bugs, innocuous treatment and interaction.

Types of Breakdowns

Conversation Flow	Misunderstandings/Inability to respond	Glitches
Offering strange presets	Lack of acknowledgment	Typos
Abrupt shifts in topics	Inability to respond to emojis	Repeated queries/circular convos
Ill-timed Jokes	Inability to understand free text	Lack of timely response
Illusion of choice	Continuing to an activity a user declined
Providing two options that are both yeses	Perceiving positive emotions as negative
Lack of variety - must fit into the closest option

Examples of Conversation Breakdowns

Working Alliance Inventory (WAI)

Score out of 60 possible, higher = greater WA

	Wysa	Woebot
P1	25	35
P2	29	44
P3	20	28
Average	24.7	35.7

Source Credibility Measure (SCM)

Each subset score out of 42 possible, higher = better

	Wysa	Woebot
Competence (avg)	21.3	28
P1	21	26
P2	19	33
P3	24	25
Caring (avg)	25	34.3
P1	21	36
P2	31	36
P3	23	31
Trust (avg)	27.7	33.3
P1	29	3
P2	29	34
P3	25	32

Discussion

Woebot

Chatbot is the only feature
Higher SCM + WAI scores
Offers few free-text responses - less room for misunderstanding
Still saw breakdowns though
Woebot asks for feedback on individual messages, and at the end of each conversation

Wysa

Chatbot is not the main feature
Solely free-text responses open the app up to breakdowns
May expect more from free response as it’s more advanced
Offers only negative mood options - difficult to discuss positive days
Users felt like they had to carry the conversation
Felt like more of a machine, whereas Woebot felt like more of a friend

Share on

Twitter Facebook LinkedIn

Tessa Eagle

Background and Motivation

Methods used: Diary Study, Standardized Scales

Participants

Results

Types of Breakdowns

Examples of Conversation Breakdowns

Working Alliance Inventory (WAI)

Source Credibility Measure (SCM)

Discussion

Share on