Home

Awesome

LipNet: End-to-End Sentence-level Lipreading

The PyTorch implementation of 'LipNet: End-to-End Sentence-level Lipreading' by Yannis M. Assael, Brendan Shillingford, Shimon Whiteson, and Nando de Freitas (https://arxiv.org/abs/1611.01599). We use PyTorch to build the LipNet model with minor changes. This reproduction achieves 13.3%/4.6% WER in unseen/overlapped testing, which exceeds all evaluation metrics in the original paper and reaches the state-of-the-art performance.

Demo

LipNet Demo

Results

ScenarioImage Size (W x H)CERWER
Unseen speakers (Origin)100 x 506.7%13.6%
Overlapped speakers (Origin)100 x 502.0%5.6%
Unseen speakers (Ours)128 x 646.7%13.3%
Overlapped speakers (Ours)128 x 641.9%4.6%

Notes:

Data Statistics

Following the original split, we use s1, s2, s20, s22 in unseen speakers testing and choose 255 random sentences from each speaker in overlapped speakers testing.

ScenarioTrainValidation
Unseen speakers (Origin)287753971
Overlapped speakers (Origin)243318415
Unseen speakers (Ours)288373986
Overlapped speakers (Ours)244088415

Data Preparation

We provide cropped lip images and annotation files in the following links:

BaiduYun (Code: jf0l)

The original GRID Corpus can be found here.

Download all parts and concatenate the files using the following command:

cat GRID_LIP_160x80_TXT.zip.* > GRID_LIP_160x80_TXT.zip
unzip GRID_LIP_160x80_TXT.zip
rm GRID_LIP_160x80_TXT.zip

The extracted folder contains lip and GRID_align_txt folders, which store the cropped lip images and the annotation files. You can create symbolic links to the LipNet-PyTorch project:

ln -s PATH_OF_DOWNLOADED_DATA/lip LipNet-PyTorch/lip
ln -s PATH_OF_DOWNLOADED_DATA/GRID_align_txt LipNet-PyTorch/GRID_align_txt

Beyond our provided data, if you want to establish a whole lip-reading pipeline by yourself, we provide code of face detection and alignment in the scripts/ folder for reference. You can concat fengdalu@gmail.com or dalu.feng@vipl.ict.ac.cn for cooperation.

Training And Testing

Run the program main.py to train and test LipNet model:

python main.py

To monitor training progress:

tensorboard --logdir logs

Data configurations and hyperparameters are configured in options.py. Please pay attention that you may need to modify it to make the program work as expected (e.g. data path, learning rate, batch size, and so on). The options.py should like this.

gpu = '0'
random_seed = 0
data_type = 'unseen'
video_path = 'lip/'
train_list = f'data/{data_type}_train.txt'
val_list = f'data/{data_type}_val.txt'
anno_path = 'GRID_align_txt'
vid_padding = 75
txt_padding = 200
batch_size = 96
base_lr = 2e-5
num_workers = 16
max_epoch = 10000
display = 10
test_step = 1000
save_prefix = f'weights/LipNet_{data_type}'
is_optimize = True

weights = 'pretrain/LipNet_unseen_loss_0.44562849402427673_wer_0.1332580699113564_cer_0.06796452465503355.pt'

Optional arguments:

Simple demo

We provide a simple demo of LipNet. You can run python demo.py PATH_TO_YOUR_MP4 to watch. :)

Dependencies

Bibtex

@article{assael2016lipnet,
  title={LipNet: End-to-End Sentence-level Lipreading},
  author={Assael, Yannis M and Shillingford, Brendan and Whiteson, Shimon and de Freitas, Nando},
  journal={GPU Technology Conference},
  year={2017},
  url={https://github.com/Fengdalu/LipNet-PyTorch}
}

License

The MIT License