Awesome
Yahoo-CHQ-Summ
CHQ-Summ: A Dataset for Consumer Healthcare Question Summarization
Pre-requistite
Download the following Transformers repo https://github.com/huggingface/transformers/tree/v4.1.1/examples/seq2seq
Place the content of Yahoo-CHQ-Summ into transformers/examples/seq2seq
The code requires Python 3 and please install the Python dependencies with the command:
pip install -r requirements.txt
Data Preparation
- Download the CHQ-Summ dataset from OSF repository and place train.json/val.json/test.json in
data/dataset/CHQ-Summ
directory - Download the Yahoo L6 dataset from here and place the xml file in
data/dataset/Yahoo-L6
- Run the following code to extract the
subject
andcontent
from the Yahoo L6 dataset.python read_yahoo_data.py --Yahoo_data_path /path/to/the/Yahoo-L6/dataset --CHQ_summ_path data/dataset/CHQ-Summ
Running the code
-
Update the
CURRENT_DIR
path inrun_4_chq_sum.sh
-
Train and evalaute the models on CHQ-Summ dataset.
bash run_4_chq_sum.sh