Abstract
Large language models (LLMs) have shown promising capabilities in healthcare analysis but face several challenges like hallucinations, parroting, and bias manifestation. These challenges are exacerbated in complex, sensitive, and low-resource domains. Therefore, in this work we introduce IC-AnnoMI, an expert-annotated motivational interviewing (MI) dataset built upon AnnoMI by generating in-context conversational dialogues leveraging LLMs, particularly ChatGPT. IC-AnnoMI employs targeted prompts accurately engineered through cues and tailored information, taking into account therapy style (empathy, reflection), contextual relevance, and false semantic change. Subsequently, the dialogues are annotated by experts, strictly adhering to the Motivational Interviewing Skills Code (MISC), focusing on both the psychological and linguistic dimensions of MI dialogues. We comprehensively evaluate the IC-AnnoMI dataset and ChatGPT’s emotional reasoning ability and understanding of domain intricacies by modeling novel classification tasks employing several classical machine learning and current state-of-the-art transformer approaches. Finally, we discuss the effects of progressive prompting strategies and the impact of augmented data in mitigating the biases manifested in IC-AnnoMI.
Motivation
Why focus on MI and LLMs? Motivational Interviewing (MI) is a proven therapeutic technique for behavioral change, but access is limited due to cost and clinician availability. LLMs could help democratize access, but they face critical challenges in sensitive healthcare domains: hallucinations, stochastic parroting, and bias manifestation.
Mental health domains suffer from data scarcity—there are few publicly available resources that could help develop responsible AI systems. This work addresses this gap by creating high-quality synthetic MI dialogues using LLMs and rigorous expert annotation.
IC-AnnoMI Dataset
IC-AnnoMI is an expert-annotated motivational interviewing dataset built upon AnnoMI by generating in-context conversational dialogues using ChatGPT with carefully engineered prompts.
Data Generation Process
- Progressive Prompting: Iteratively refine prompts until output quality matches original MI dialogues
- Context-Aware Generation: Consider therapy style (empathy, reflection), contextual relevance, and semantic consistency
- Expert Annotation: Strict adherence to the Motivational Interviewing Skills Code (MISC)
Annotation Dimensions
| Dimension | Components |
|---|---|
| Psychological (MIpsych) | Empathy, Non-judgmental attitude, Therapist competence, Ethical conduct |
| Linguistic (MIlinguist) | Language comprehension, MI structure, False semantic change, Contextual reasoning |
Methodology
Progressive Prompt Engineering
We developed a systematic approach to prompt refinement:
- Initial prompts based on MI dialogue context, plausibility, and quality requirements
- Manual evaluation of outputs for inconsistencies
- Iterative tuning until generated quality matches original MI dialogues
MISC-Based Annotation
Annotations are grounded in the Manual for the Motivational Interviewing Skill Code (MISC), covering both psychological and linguistic dimensions:
- Empathy: Therapist's ability to demonstrate understanding through active listening and reflective statements
- Non-judgmental attitude: Creating a safe, supportive environment for clients
- Competence: Therapist's proficiency in applying MI techniques effectively
- Ethical conduct: Prioritizing client well-being, autonomy, and confidentiality
Evaluation
Classification Tasks
We model novel classification tasks to evaluate ChatGPT's capabilities:
- Identifying high- vs. low-quality MI dialogues
- Assessing emotional reasoning ability
- Understanding of domain intricacies
- Detecting biases (contextual, sampling, class imbalance)
Models Evaluated
- Classical machine learning approaches
- State-of-the-art transformer models
Key Findings
Main Contribution: IC-AnnoMI provides the MI community with a comprehensive dataset and valuable insights for using LLMs in empathetic text generation for conversational therapy in supervised settings.
Insights on LLM Usage in Healthcare
- Progressive prompting helps: Iterative refinement significantly improves dialogue quality
- Augmented data mitigates bias: Synthetic data can help address class imbalance issues
- Human supervision is critical: Unsupervised LLM use in sensitive domains poses risks
- Expert collaboration needed: Domain experts are essential for responsible LLM implementation
Risks and Recommendations
We discuss the dangers of unsupervised LLM employment in healthcare, emphasizing:
- Need for collaboration with domain experts
- Importance of human supervision
- Responsible implementation across healthcare settings
Contributions
- Tailored prompting approach: Progressive prompt-based augmentation techniques for in-context MI dialogue generation
- Expert annotation scheme: Rigorous annotation covering psychological and linguistic aspects grounded on MISC
- Comprehensive evaluation: Baselines and analysis of LLM capabilities and limitations in sensitive domains
- Public resource: IC-AnnoMI dataset and source code publicly available
BibTeX
@inproceedings{kumar2024unlocking,
author = {Kumar, Vivek and Ntoutsi, Eirini and Rajwat, Pushpraj Singh and Medda, Giacomo and Reforgiato Recupero, Diego},
title = {Unlocking LLMs: Addressing Scarce Data and Bias Challenges in Mental Health and Therapeutic Counselling},
booktitle = {Proceedings of the 1st International Conference on NLP \& AI for Cyber Security},
pages = {238--251},
year = {2024},
url = {https://aclanthology.org/2024.nlpaics-1.26/}
}