Home

Awesome

CPsyCoun

CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling

<p align="center"> <a href="https://huggingface.co/CAS-SIAT-XinHai/CPsyCounX"><img src="https://img.shields.io/badge/CPsyCounX-yellow" alt="CPsyCounX"></a> <a href="https://github.com/CAS-SIAT-XinHai/CPsyCoun"><img src="https://img.shields.io/badge/GitHub-24292e" alt="github"></a> <a href="https://huggingface.co/datasets/CAS-SIAT-XinHai/CPsyCoun"><img src="https://img.shields.io/badge/CPsyCounD-yellow" alt="CPsyCounD"></a> <a href="https://huggingface.co/datasets/CAS-SIAT-XinHai/CPsyCounR"><img src="https://img.shields.io/badge/CPsyCounR-yellow" alt="CPsyCounR"></a> </p>

🔥News

Method

CPsyCoun Framework

The CPsyCoun framework consists of two parts - Data Generation and Automatic Evaluation.

Framework

Dialogue Reconstruction

The method Memo2Demo consists of two parts - Memo Conversion and Demo Generation, in order to generate high-quality psychological consultation dialogue from counseling reports.

Memo2Demo

Counseling Report

Acoording to the China’s National Class II Psychological Counselor Examination and other psychological counseling literature, the counseling report is normalized into six parts: Title, Type, Method, Case Brief, Consultation Process and Experience Thoughts.

Counseling_Report

CPsyCounD

The high-quality multi-turn dialogue dataset, which has a total of 3,134 multi-turn consultation dialogues.

Evaluation Framework

Evaluation Metrics

Score Criterion

Score Criterion

Turn-Based Dialogue Evaluation

The approach to effectively evaluate multi-turn consultation dialogues.

Denote a $m$-turn dialogue as a set of paired elements ${(q_i,r_i)|i=1, 2, ..., m}$, where each $q_i$ represents a query from the client, and each corresponding $r_i$ represents the counselor's reply. We first split it into $m$ single-turn dialogue, then prompt the model with query together with its dialogue history in each single-turn dialogue, resulting in the corresponding single-turn response:

math_1

where $h_i={(q_j, r_j)|j=1, 2, ..., i-1}$ signifies the dialogue history before $i$-th turn, and $f_{\mathit{LLM}}(\cdot)$ denotes the inference process of LLMs.

Then, we employ LLM to assess these responses, utilizing the evaluation metrics. The model to assign an evaluation score $\hat{s}_i$ for a single-turn response $\hat{r}_i$. Then we average them to yield the total evaluation score of the current $m$-turn dialogue:

math_2

CPsyCounE

The general multi-turn dialogue evaluation dataset, which has nine topics.

Experiments

Intrinsic Evaluation

Role-play VS Memo2Demo

Statistics

Intrinsic evaluation

Extrinsic Evaluation

CPsyCounX

We further fine-tune InternLM2-7B-Chat on CPsyCounD. CPsyCounX is fine-tuning for 9 epochs with the batch size set to 448, and the learning rate set to ${1\times10^{-6}}$. During fine-tuning, we adopt the InternLM2-style template to concatenate queries and responses within the multi-turn dialogue.

Results

Extrinsic evaluation

Radar plot

Full results

Citation

If you find our work helpful in your research, please cite the following paper:

@inproceedings{zhang-etal-2024-cpsycoun,
    title="{CP}sy{C}oun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for {C}hinese Psychological Counseling",
    author="Zhang, Chenhao  and Li, Renhao  and Tan, Minghuan  and Yang, Min  and Zhu, Jingwei  and Yang, Di  and Zhao, Jiahao  and Ye, Guancheng  and Li, Chengming  and Hu, Xiping",
    journal={ACL},
    year={2024}
}