Home

Awesome

Conversational Word Embedding for Retrieval-based Dialog System

This repository contains resources of the following ACL 2020 paper.

Title: Conversational Word Embedding for Retrieval-based Dialog System
Authors: Wentao Ma, Yiming Cui, Ting Liu, Dong Wang, Shijin Wang, Guoping Hu
Link: https://www.aclweb.org/anthology/2020.acl-main.127/

News

2020/7/16 we have released the codes, and the data for the personachat example is in data_for_personachat (password vzoQ). You can clone the codes and download the data to have a try ^^
2020/7/6 We have already uploaded the Chinese PR-Embedding based on Zhidao (password g3FK) and Weibo (password Yz6H) Zhidao Embedding | Weibo Embedding. Where the Zhidao Embedding has been used in the experiment part of the paper.

Requirements

gcc4.8.5 (or >=4.8.5)
Python3.6
Keras2.1.2 (or >=2.1)
Tensorflow1.12.0 (or >=1.12)
(We run the codes in gcc4.8.5 + Python3.6 + Keras2.1.2 + Tensorflow1.12.0)

Quick Start

  1. Prepare the corpus: each line has two columns corresponding to a conversation pair, for example:
    Hi , how are you \t I am fine , Thank you !
  2. Train your PR-Embedding:
    sh train.sh $corpus_file $embedding_file
    $corpus_file is coupus file in step 1 and $embedding_file is the output PR-Embedding.

Citation

If you use the data or codes in this repository, please cite our paper

@inproceedings{ma-etal-2020-conversational,
    title = "{C}onversational {W}ord {E}mbedding for {R}etrieval-{B}ased {D}ialog {S}ystem",
    author = "Ma, Wentao  and
      Cui, Yiming  and
      Liu, Ting  and
      Wang, Dong  and
      Wang, Shijin  and
      Hu, Guoping",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.127",
    pages = "1375--1380",
}