Home

Awesome

Revisiting End-to-End Speech-to-Text Translation From Scratch

Paper | Highlights | Overview | Model | Training&Eval | Citation Updates

This repository contains source code, models, and also instructions for our ICML paper.

Note, by ST from scratch, we refer to the setup where ST models are trained on speech-translation pairs alone without using transcripts or any type of pretraining.

By pretraining, we mainly refer to ASR/MT pretraining using the triplet training data.

Updates

Paper Highlights

We explore the extent to which the quality of end-to-end speech-translation trained on speech-translation pairs alone and from scratch can be improved.

Model Visualization

Overview of ur proposal

Apart from parameterized distance penalty, we propose to jointly apply MLE and CTC objective for training, even though we use translation as CTC labels.

Pretrained Models

ModelBLEU on MuST-C En-De
Fairseq (pretrain-finetune)22.7
NeurST (pretrain-finetune)22.8
Espnet (pretrain-finetune)22.9
this work (ST from scratch)22.7

Requirement

The source code is based on older tensorflow.

Training and Evaluation

Please check out the example for reference.

Citation

If you draw any inspiration from our study, please consider to cite our paper:

@inproceedings{
zhang2022revisiting,
title={Revisiting End-to-End Speech-to-Text Translation From Scratch},
author={Biao Zhang and Barry Haddow and Rico Sennrich},
booktitle={International Conference on Machine Learning},
year={2022},
}