Home

Awesome

Transformer - Attention Is All You Need

Chainer-based Python implementation of Transformer, an attention-based seq2seq model without convolution and recurrence.
If you want to see the architecture, please see net.py.

See "Attention Is All You Need", Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017.

This repository is partly derived from my convolutional seq2seq repo, which is also derived from Chainer's official seq2seq example.

Requirement

Prepare Dataset

You can use any parallel corpus.
For example, run

sh download_wmt.sh

which downloads and decompresses training dataset and development dataset from WMT/europal into your current directory. These files and their paths are set in training script train.py as default.

How to Run

PYTHONIOENCODING=utf-8 python -u train.py -g=0 -i DATA_DIR -o SAVE_DIR

During training, logs for loss, perplexity, word accuracy and time are printed at a certain internval, in addition to validation tests (perplexity and BLEU for generation) every half epoch. And also, generation test is performed and printed for checking training progress.

Arguments

Some of them is as follows:

Please see the others by python train.py -h.

Note

This repository does not aim for complete validation of results in the paper, so I have not eagerly confirmed validity of performance. But, I expect my implementation is almost compatible with a model described in the paper. Some differences where I am aware are as follows: