Home

Awesome

Learn an Effective Lip Reading Model without Pains

PWC PWC

Content

Introduction

This is the repository of An Efficient Software for Building Lip Reading Models Without Pains. In this repository, we provide a deep lip reading pipeline as well as pre-trained models and training settings. We evaluate our pipeline on LRW Dataset and LRW1000 Dataset. We obtain 88.4% and 56.0% on LRW and LRW-1000, respectively. The results are comparable and even surpass current state-of-the-art results. Especially, we reach the current state-of-the-art result (56.0%) on LRW-1000 Dataset.

Benchmark

YearMethodLRWLRW-1000
2017Chung et al.61.1%25.7%
2017Stafylakis et al.83.5%38.2%
2018Stafylakis et al.88.8%-
2019Yang et at.-38.19%
2019Wang et al.83.3%36.9%
2019Weng et al.84.1%-
2020Luo et al.83.5%38.7%
2020Zhao et al.84.4%38.7%
2020Zhang et al.85.0%45.2%
2020Martinez et al.85.3%41.4%
2020Ma et al.87.7%43.2%
2020ResNet18 + BiGRU (Baseline + Cosine LR)85.0%47.1%
2020ResNet18 + BiGRU (Baseline with word boundary + Cosine LR)87.5%55.0%
2020Our Method86.2%48.3%
2020Our Method (with word boundary)88.4%56.0%

Dataset Preparation

  1. Download LRW Dataset and LRW1000 Dataset and link lrw_mp4 and LRW1000_Public in the root of this repository:
ln -s PATH_TO_DATA/lrw_mp4 .
ln -s PATH_TO_DATA/LRW1000_Public .
  1. Run scripts/prepare_lrw.py and scripts/prepare_lrw1000.py to generate training samples of LRW and LRW-1000 Dataset respectively:
python scripts/prepare_lrw.py
python scripts/prepare_lrw1000.py 

The mouth videos, labels, and word boundary information will be saved in the .pkl format. We pack image sequence as jpeg format into our .pkl files and decoding via PyTurboJPEG. If you want to use your own dataset, you may need to modify the utils/dataset.py file.

Pretrain Weights

We provide pretrained weight on LRW/LRW-1000 dataset for evaluation. For smaller datasets, the pretrained weights can be provide a good start point for feature extraction, finetuning, and so on.

Link of pretrained weights: Baidu Yun (code: ivgl)

If you can not access to provided links, please email dalu.feng@vipl.ict.ac.cn or fengdalu@gmail.com.

How to test

To test our provided weights, you should download weights and place them in the root of this repository.

For example, to test baseline on LRW Dataset:

python main_visual.py \
    --gpus='0'  \
    --lr=0.0 \
    --batch_size=128 \
    --num_workers=8 \
    --max_epoch=120 \
    --test=True \
    --save_prefix='checkpoints/lrw-baseline/' \
    --n_class=500 \
    --dataset='lrw' \
    --border=False \
    --mixup=False \
    --label_smooth=False \
    --se=False \
    --weights='checkpoints/lrw-cosine-lr-acc-0.85080.pt'

To test our model in LRW-1000 Dataset:

python main_visual.py \
    --gpus='0'  \
    --lr=0.0 \
    --batch_size=128 \
    --num_workers=8 \
    --max_epoch=120 \
    --test=True \
    --save_prefix='checkpoints/lrw-1000-final/' \
    --n_class=1000 \
    --dataset='lrw1000' \
    --border=True \
    --mixup=False \
    --label_smooth=False \
    --se=True \
    --weights='checkpoints/lrw1000-border-se-mixup-label-smooth-cosine-lr-wd-1e-4-acc-0.56023.pt'

How to train

For example, to train lrw baseline:

python main_visual.py \
    --gpus='0,1,2,3'  \
    --lr=3e-4 \
    --batch_size=400 \
    --num_workers=8 \
    --max_epoch=120 \
    --test=False \
    --save_prefix='checkpoints/lrw-baseline/' \
    --n_class=500 \
    --dataset='lrw' \
    --border=False \
    --mixup=False \
    --label_smooth=False \
    --se=False  

Optional arguments:

More training details and setting can be found in our paper. We plan to include more pretrained models in the future.

Dependencies

Citation

If you find this code useful in your research, please consider to cite the following papers:

@inproceedings{feng2021efficient,
  title={An Efficient Software for Building LIP Reading Models Without Pains},
  author={Feng, Dalu and Yang, Shuang and Shan, Shiguang},
  booktitle={2021 IEEE International Conference on Multimedia \& Expo Workshops (ICMEW)},
  pages={1--2},
  year={2021},
  organization={IEEE}
}
@article{feng2020learn,
  author       = "Feng, Dalu and Yang, Shuang and Shan, Shiguang and Chen, Xilin",
  title        = "Learn an Effective Lip Reading Model without Pains",
  journal      = "arXiv preprint arXiv:2011.07557",
  year         = "2020",
}

License

The MIT License