Trust in Chatbots and Taxonomy of Breakdowns

Project Year: 2020

Background and Motivation

  • Mental health apps are increasingly prevalent, 12,000+ mental health apps available (Schueller et al., 2019)
  • Apps and content around mental health are largely unregulated (Neary and Schueller, 2018), few apps backed up by research/RCTs/efficacy data
  • Chatbots are increasingly utilized in this domain (e.g., Woebot, Wysa, Replika)
  • Breakdowns in communication decrease trust in chatbots (Ashktorab et al., 2019)

Research Questions:

  • What features lead to increased perceptions of trustworthiness and competence in mental health chatbots?
    • Hypothesis: Greater presence of communication breakdowns will lead to decreased perceptions of trust and working alliance
    • Working Alliance: client/therapist relationship and shared goals toward positive change

Methods used: Diary Study, Standardized Scales

  • Compared two mental health chatbot apps - Wysa and Woebot
  • Diary Study - Participants used each app and kept a log of: Duration of conversations, Communication breakdowns experienced, Empathetic expressions or chatbot self-referencing
  • Pre- and post- study surveys containing:
    • Source Credibility Measure (trust and competence)
    • Working Alliance Inventory
  • Dependent Variables: Measures of Trust, Competence, and Working Alliance
  • Independent Variable: Number of communication breakdowns/errors

Participants

  • 3 participants, 10 days using each of the two apps
  • Participants chatted with Woebot daily for an average of 9.7 minutes and with Wysa daily for an average of 13.2 minutes

Results

P1: It [Woebot] seemed to genuinely care about how I was feeling, it was patient and respected my space (if I wanted it), and wanted me to feel better

P2: [Wysa] Repetitive not only in the activities, but also the CBT theories - every time, it says why the thought is unhelpful and how I can fix that by using positive thinking. There is no variation in its education

P3: Wysa is obviously unpolished. Woebot’s conversations felt fluid, restricted, and clear. Unlike that, Wysa’s conversations can be disrupted by bugs, innocuous treatment and interaction.

Types of Breakdowns

Conversation FlowMisunderstandings/Inability to respondGlitches
Offering strange presetsLack of acknowledgmentTypos
Abrupt shifts in topicsInability to respond to emojisRepeated queries/circular convos
Ill-timed JokesInability to understand free textLack of timely response
Illusion of choiceContinuing to an activity a user declined 
Providing two options that are both yesesPerceiving positive emotions as negative 
Lack of variety - must fit into the closest option  

Examples of Conversation Breakdowns

Working Alliance Inventory (WAI)

  • Score out of 60 possible, higher = greater WA
 WysaWoebot
P12535
P22944
P32028
Average24.735.7

Source Credibility Measure (SCM)

  • Each subset score out of 42 possible, higher = better
 WysaWoebot
Competence (avg)21.328
P12126
P21933
P32425
Caring (avg)2534.3
P12136
P23136
P32331
Trust (avg)27.733.3
P1293
P22934
P32532

Discussion

Woebot

  • Chatbot is the only feature
  • Higher SCM + WAI scores
  • Offers few free-text responses - less room for misunderstanding
  • Still saw breakdowns though
  • Woebot asks for feedback on individual messages, and at the end of each conversation

Wysa

  • Chatbot is not the main feature
  • Solely free-text responses open the app up to breakdowns
  • May expect more from free response as it’s more advanced
  • Offers only negative mood options - difficult to discuss positive days
  • Users felt like they had to carry the conversation
  • Felt like more of a machine, whereas Woebot felt like more of a friend