Awesome
transformer-sequential
This repo contains the code for three papers:
- Feedback Transformer
- Expire-Span
- Staircase Transformer
The training code is structured for long sequential modeling with Transformer-like architectures.
Requirements
You will need a CUDA-enabled GPU to run the code.
Setup
Run the following:
pip install -r requirements.txt
Feedback Transformer
Introduced in Addressing Some Limitations of Transformers with Feedback Memory.
Running Experiments from the Paper
enwik8
Model | Params | Valid | Test |
---|---|---|---|
Feedback Transformer | 77M | 0.984 | 0.962 |
Numbers are Bits-Per-Character
bash experiments/feedback/enwik8.sh
Algorithmic
Model | 3 Variable | 5 Variable |
---|---|---|
Transformer | 33.7 | 37.5 |
Feedback Transformer | 99.1 | 92.6 |
Numbers are % Accuracy on Test
bash experiments/feedback/algorithmic_3var.sh
bash experiments/feedback/algorithmic_5var.sh
Expire-Span
Introduced in Not All Memories are Created Equal: Learning to Expire.
Running Experiments from the Paper
enwik8
Model | Params | Valid | Test |
---|---|---|---|
Expire-Span 12L | 38M | 1.014 | 0.994 |
Numbers are Bits-Per-Character
bash experiments/expire_span/enwik8.sh
Object Collision
Model | Maximum Span | Test Error (%) |
---|---|---|
Expire-Span | 16k | 52.2 |
Expire-Span | 32k | 36.7 |
Expire-Span | 64k | 26.7 |
bash experiments/expire_span/object_collision_16k.sh
bash experiments/expire_span/object_collision_32k.sh
bash experiments/expire_span/object_collision_64k.sh
Staircase
Introduced in Staircase Attention for Recurrent Processing of Sequences. Note this algorithmic task in this repo is slightly different from what was used in the paper, while the number might not exactly match, it does show the same trend as in the paper. And the model implementation / hyperparameter remains the same.
Running Experiments from the Paper
Algorithmic
Model | Test |
---|---|
Transformer | 58.44% |
Staircase Transformer | 3.6% |
Numbers are % error rate on Test
bash experiments/staircase/algorithmic_3var.sh
License
The code is licensed under CC-BY-NC license. See the LICENSE file for more details.