Home

Awesome

DCT-Former: Efficient Self-Attention with Discrete Cosine Transform PAPER

Requirements

conda env create -f environment.yml

Dataset

Pretraining Dataset

The pre-processing stages are taken from academic-budget-bert, additional information is available in data/README.md

python shard_data.py \
    --dir <path_to_text_files> \
    -o <output_dir> \
    --num_train_shards 256 \
    --num_test_shards 128 \
    --frac_test 0.1
python generate_samples.py \
    --dir <path_to_shards> \
    -o <output_path> \
    --dup_factor 10 \
    --seed 42 \
    --do_lower_case 1 \
    --masked_lm_prob 0.15 \ 
    --max_seq_length 128 \
    --model_name bert-base-uncased \
    --max_predictions_per_seq 20 \
    --n_processes 4

Finetuning Dataset

For finetuining the "Large Movie Review" dataset is used, which is freely available HERE

Training

Pretraining (English Wikipedia)

When the trining is complete run:

python -m torch.distributed.launch --nproc_per_node=1 --master_addr="127.0.0.1" --master_port=1234 main.py --exp_name=<exp_log_path> --conf_file_path=<log_dir> --mode=test

To compute the pretraining metrics (Accuracy) on the validation set.

Finetuning (ImDB)

Acknowledgments

Reference (Published in Journal of Scientific Computing)

@article{scribano2023dct,
  title={DCT-Former: Efficient Self-Attention with Discrete Cosine Transform},
  author={Scribano, Carmelo and Franchini, Giorgia and Prato, Marco and Bertogna, Marko},
  journal={Journal of Scientific Computing},
  volume={94},
  number={3},
  pages={67},
  year={2023},
  publisher={Springer}
}