Home

Awesome

Recurrent Memory Transformer

This code implements the RMT architecture from

https://arxiv.org/abs/2207.06881 Aydar Bulatov, Yuri Kuratov, Mikhail S. Burtsev

Note!: For newer plug-and-play RMT version that works with pretrained HF models please refer to:

paper code Scaling Transformer to 1M tokens and beyond with RMT

RMT is a memory-augmented segment-level recurrent Transformer. It achieves state-of-the art results on Hyperpartisan dataset and beats Transformer-XL on algorithmic tasks and LM with limited input and memory size.

Recurrent Memory Transformer is implemented as follows:

RMT

We implement our memory mechanism with no changes to Transformer model by adding special memory tokens to the input sequence. The model is trained to control both memory operations and sequence representations processing.

Performance

TaskDatasetMetricTransformerTransformer-XLRMT
LM*WT-103ppl29.9524.1223.99
LM*enwik8bpc1.391.2831.228
step-by-stepqudaratic equationsacc93.499.8
Classification(Hyperpartisan)acc94.998.1

* - limited input and memory size

Code

Scripts for running language modeling, algorithmic and mathematical experiments can be found in the core pytorch folder.

Our code is based on the Transformer-XL repository. The recurrent memory mechanism is implemented by updating the Transformer-XL PyTorch code. For details please refer to the source readme.

All LM and algorithmic experiments from our paper were conducted using this repository. Raw experiments results from the paper can be found in the experiments folder:

Reproduce results

Language modeling:

Similarly can be used with large models on WT103 and base and large models on enwik8.

Algorithmic tasks:

Run training:

Here LEN is model input size. For training on reverse / quadratic equations substitute 'copy' with 'reverse' / 'sqeq'.

Citation

If you find our work useful, please cite the NeurIPS 2022 paper:

@inproceedings{
bulatov2022recurrent,
title={Recurrent Memory Transformer},
author={Aydar Bulatov and Yuri Kuratov and Mikhail Burtsev},
booktitle={Advances in Neural Information Processing Systems},
editor={Alice H. Oh and Alekh Agarwal and Danielle Belgrave and Kyunghyun Cho},
year={2022},
url={https://openreview.net/forum?id=Uynr3iPhksa}
}