Awesome

transformer-sequential

This repo contains the code for three papers:

Feedback Transformer
Expire-Span
Staircase Transformer

The training code is structured for long sequential modeling with Transformer-like architectures.

Requirements

You will need a CUDA-enabled GPU to run the code.

Setup

Run the following:

pip install -r requirements.txt

Feedback Transformer

Introduced in Addressing Some Limitations of Transformers with Feedback Memory.

Running Experiments from the Paper

enwik8

Model	Params	Valid	Test
Feedback Transformer	77M	0.984	0.962

Numbers are Bits-Per-Character

bash experiments/feedback/enwik8.sh

Algorithmic

Model	3 Variable	5 Variable
Transformer	33.7	37.5
Feedback Transformer	99.1	92.6

Numbers are % Accuracy on Test

bash experiments/feedback/algorithmic_3var.sh
bash experiments/feedback/algorithmic_5var.sh

Expire-Span

Introduced in Not All Memories are Created Equal: Learning to Expire.

Running Experiments from the Paper

enwik8

Model	Params	Valid	Test
Expire-Span 12L	38M	1.014	0.994

Numbers are Bits-Per-Character

bash experiments/expire_span/enwik8.sh

Object Collision

Model	Maximum Span	Test Error (%)
Expire-Span	16k	52.2
Expire-Span	32k	36.7
Expire-Span	64k	26.7

bash experiments/expire_span/object_collision_16k.sh
bash experiments/expire_span/object_collision_32k.sh
bash experiments/expire_span/object_collision_64k.sh

Staircase

Introduced in Staircase Attention for Recurrent Processing of Sequences. Note this algorithmic task in this repo is slightly different from what was used in the paper, while the number might not exactly match, it does show the same trend as in the paper. And the model implementation / hyperparameter remains the same.

Running Experiments from the Paper

Algorithmic

Model	Test
Transformer	58.44%
Staircase Transformer	3.6%

Numbers are % error rate on Test

bash experiments/staircase/algorithmic_3var.sh

License

The code is licensed under CC-BY-NC license. See the LICENSE file for more details.