Awesome
TSPNet
TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation
By Dongxu Li*, Chenchen Xu*, Xin Yu, Kaihao Zhang, Benjamin Swift, Hanna Suominen and Hongdong Li
(* Authors contributed equally.)
<img src='figs/teaser.png'>The repository contains the implementation of TSPNet. Preprocessed dataset, video features and the inference results are available at Google Drive.
We thank authors of fairseq for their efforts.
Rquirements
- PyTorch version >= 1.4.0
- Python version >= 3.6
- For training new models, you'll also need NVIDIA GPU and (optionally) NCCL
- (optional) BPEMB, if you prepare datasets by yourself (see below)
Install from source
Install the project from source and develop locally:
cd TSPNet/
pip install --editable .
Getting started
Preprocessing
Download the preprocessed dataset, and arrange them as:
TSPNet/
├── i3d-features/
│ ├── span=8_stride=2
│ ├── span=12_stride=2
│ └── span=16_stride=2
├── data-bin/
│ └── phoenix2014T/
│ └── sp25000/
│
├── README.md
├── run-scripts/
└── test-scripts/
- i3d-features: the i3d output features of input videos
- data-bin: the preprocessed translation texts
Training
Go to the run_scripts
folder and start training:
cd TSPNet/run_scripts
SAVE_DIR=CHECKPOINT_PATH bash run_phoenix_pos_embed_sp_test_3lvl.sh
<!---The script replicates performance in the paper. --->
Testing
After training, you can make inference on the testing dataset by specifying a checkpoint file.
<!---To validate the model on the testing test, run the testing script with the checkpoints saved from the training step.--->Note, CHECKPOINT_FILE_PATH
points to a saved checkpoint file, rather the CHECKPOINT
folder.
CHECKPOINT=CHECKPOINT_FILE_PATH bash test_phoenix_pos_embed_sp_test_3lvl.sh
The script reports multiple metrics, including the ROUGE-L and BLEU-{n} as reported in the paper.
Alternative instructions for preparing datasets by yourself
- Text
Install German word embeddings BPEMB by pip install bpemb
.
Preprocess the translation texts using preprocess_sign.py
to BPE, repeatedly for each split, for example:
python preprocess_sign.py --save-vecs data/processed/emb data/ori/phoenix2014T.train.de data/processed/train.de
python preprocess_sign.py data/ori/phoenix2014T.test.de data/processed/test.de
- Vocabulary
Generate the dictionary file dict.de.txt
.
fairseq-preprocess --source-lang de --target-lang de --trainpref data/processed/train --testpref data/processed/test --destdir data-bin/ --dataset-impl raw
- Video Prepare sign videos and the corresponding video features (e.g. by pretrained i3d networks), and create a json file for each split (e.g. train.sign-de.sign). The json file should be of the format below. It should have the same number of entries as the text file, where each entry corresponds to the sentence at the same line no in the prepared text file.
[
{
"ident": "VIDEO_ID",
"size": "64 // length of video features"
},
"..."
]
- Finally, arrange text files, video json files, word embeddings and vocabulary files into a folder as below:
data-bin/
├── train.sign-de.sign
├── train.sign-de.de
│
├── test.sign-de.sign
├── test.sign-de.de
│
├── emb
└── dict.de.txt
Citations
Please cite our paper and WLASL dataset (for pre-training) as:
@inproceedings{li2020tspnet,
title = {TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation},
author = {Li, Dongxu and Xu, Chenchen and Yu, Xin and Zhang, Kaihao and Swift, Benjamin and Suominen, Hanna and Li, Hongdong},
year = 2020,
booktitle = {Advances in Neural Information Processing Systems},
volume = 33
}
@inproceedings{li2020word,
title={Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison},
author={Li, Dongxu and Rodriguez, Cristian and Yu, Xin and Li, Hongdong},
booktitle={The IEEE Winter Conference on Applications of Computer Vision},
pages={1459--1469},
year={2020}
}
Other works you might be interested to look at:
@inproceedings{li2020transferring,
title={Transferring cross-domain knowledge for video sign language recognition},
author={Li, Dongxu and Yu, Xin and Xu, Chenchen and Petersson, Lars and Li, Hongdong},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={6205--6214},
year={2020}
}