Home

Awesome

TTGen-K-BERT

Sorce code for "TSQA: Tabular Scenario Based Question Answering", implement is based on K-BERT. We thank the authors of K-BERT for proposing such a novel knowledge embedding model.

Prepare

Download the BERT-wwm-ext from here, and convert it to uer framework format by uer, finally put the model under the directory ./models.

Split the GeoTSQA to 5-fold:

python preprocess_data.py --all_data_path datasets/all.txt

Sentence Ranking

Run template-level ranking of TTGen:

python template_ranking.py --all_data_path datasets/all_template_ranking.txt --gpu 0

Run ranking of TTGen:

python cross_val.py --nn TTGen --gpu 0,1

QA

For multi-choice QA, we organize data in this format:

[
  [
    [
      scenario text,
      table describing text,
    ],
    [
      {
        "question": question text,
        "choice": [
          option A,
          option B,
          option C,
          option D,
        ],
        "answer": one of choice
      }
    ],
    id
  ],
  ...
]

Download C3 from here and coarse-tune BERT-wwm-ext:

CUDA_VISIBLE_DEVICES=0 bash train_c3.sh

Run question answer on GeoTSQA, for example, when we select the top-1 from the ranked sentences set generated by TTGen:

python cross_val_multi_choice.py --nn sentences_1 --data_path datasets/qa/sentences_1 --gpu 0,1

If you have any problem, feel free to contact with xiaoli.nju@smail.nju.edu.cn