Home

Awesome

Repeat After Me: Transformers are Better than State Space Models at Copying

About

This repository gathers the experiments for the paper Repeat After Me: Transformers are Better than State Space Models at Copying. The experiments divide in two parts:

Installation

<tt>pip install causal-conv1d>=1.1.0</tt> : an efficient implementation of a simple causal Conv1d layer used inside the Mamba block. <tt>pip install mamba-ssm</tt> : the core Mamba package. <tt>pip install names</tt> : names package to randomly sample names in the phone-book experiment.

Other requirements:

Synthetic experiments

These experiments are intended to study a) how well the models learn the copy task in distribution b) the length generalization ability of these models c) their performance in lookup tasks where we give a prefix or suffix n-grams.

This folder covers three tasks: <tt>copy</tt>, <tt>prefix_ngram</tt>, <tt>suffix_ngram</tt> and using three models: Transformers with different positional encodings (<tt>model = T_rope</tt>, <tt>T_nope</tt>, <tt>T_alibi</tt>, <tt>T_hard_alibi</tt>), Mamba (<tt>mamba</tt>) and LSTMs (<tt>lstm</tt>). For instance, to run an experiment where we train a Transfomer with RoPE positional encoding on the copy task for strings with length up to 20 and then evalute it on strings of length 20, this is the command to run:

python3 synthetic_tasks/main.py --model "T_rope" --train_task "copy" --eval_task  "copy" --min_train_len 5 --max_train_len 20 --min_eval_len 20 --max_eval_len 20
                               

Experiments on pre-trained models

These experiments cover three different tasks: copying natural text strings from the C4 dataset (<tt> eval_task = c4_copy</tt>), lookup on a phone-book (<tt> eval_task = phone_book</tt>) and question answering on squad_v2 (<tt> eval_task = squad</tt>). We consider in particular the following models:

For instance, to run an experiment where we evaluate a Mamba-370m on the phone-book dataset with 20 (name,phone-number) entries, we run:

python3 pretrained_exps/main.py --model "state-spaces/mamba-370m" \
                --eval_task "phone_book" \
                --min_eval_len 20\
                --max_eval_len 20\

How to cite

@article{jelassi2024repeat,
  title={Repeat after me: Transformers are better than state space models at copying},
  author={Jelassi, Samy and Brandfonbrener, David and Kakade, Sham M and Malach, Eran},
  journal={arXiv preprint arXiv:2402.01032},
  year={2024}
}