Home

Awesome

transformer-sequential

This repo contains the code for three papers:

The training code is structured for long sequential modeling with Transformer-like architectures.

Requirements

You will need a CUDA-enabled GPU to run the code.

Setup

Run the following:

pip install -r requirements.txt

Feedback Transformer

Introduced in Addressing Some Limitations of Transformers with Feedback Memory.

Running Experiments from the Paper

enwik8

ModelParamsValidTest
Feedback Transformer77M0.9840.962

Numbers are Bits-Per-Character

bash experiments/feedback/enwik8.sh

Algorithmic

Model3 Variable5 Variable
Transformer33.737.5
Feedback Transformer99.192.6

Numbers are % Accuracy on Test

bash experiments/feedback/algorithmic_3var.sh
bash experiments/feedback/algorithmic_5var.sh

Expire-Span

Introduced in Not All Memories are Created Equal: Learning to Expire.

Running Experiments from the Paper

enwik8

ModelParamsValidTest
Expire-Span 12L38M1.0140.994

Numbers are Bits-Per-Character

bash experiments/expire_span/enwik8.sh

Object Collision

ModelMaximum SpanTest Error (%)
Expire-Span16k52.2
Expire-Span32k36.7
Expire-Span64k26.7
bash experiments/expire_span/object_collision_16k.sh
bash experiments/expire_span/object_collision_32k.sh
bash experiments/expire_span/object_collision_64k.sh

Staircase

Introduced in Staircase Attention for Recurrent Processing of Sequences. Note this algorithmic task in this repo is slightly different from what was used in the paper, while the number might not exactly match, it does show the same trend as in the paper. And the model implementation / hyperparameter remains the same.

Running Experiments from the Paper

Algorithmic

ModelTest
Transformer58.44%
Staircase Transformer3.6%

Numbers are % error rate on Test

bash experiments/staircase/algorithmic_3var.sh

License

The code is licensed under CC-BY-NC license. See the LICENSE file for more details.