Home

Awesome

Introduction

Funnel-Transformer is a new self-attention model that gradually compresses the sequence of hidden states to a shorter one and hence reduces the computation cost. More importantly, by re-investing the saved FLOPs from length reduction in constructing a deeper or wider model, Funnel-Transformer usually has a higher capacity given the same FLOPs. In addition, with a decoder, Funnel-Transformer is able to recover the token-level deep representation for each token from the reduced hidden sequence, which enables standard pretraining.

For a detailed description of technical details and experimental results, please refer to our paper:

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

Zihang Dai*, Guokun Lai*, Yiming Yang, Quoc V. Le

(*: equal contribution)

Preprint 2020

Source Code

Data Download

TensorFlow

PyTorch

Pretrained models

Model SizePyTorchTensorFlowTensorFlow-Full
B10-10-10H1024LinkLinkLink
B8-8-8H1024LinkLinkLink
B6-6-6H768LinkLinkLink
B6-3x2-3x2H768LinkLinkLink
B4-4-4H768LinkLinkLink

Each .tar.gz file contains three items:

You also can use download_all_ckpts.sh to download all checkpoints mentioned above.

For how to use the pretrained models, please refer to tensorflow/README.md or pytorch/README.md respectively.

Results

glue-dev

qa