Awesome

End-to-End Speech Translation Progress

Tutorial

EACL 2021 tutorial: Speech Translation
Blog: Getting Started with End-to-End Speech Translation
ACL 2020 Theme paper: Speech Translation and the End-to-End Promise: Taking Stock of Where We Are
INTERSPEECH 2019 survey talk: Spoken Language Translation

Data

Corpus	Direction	Target	Duration	License
CoVoST 2	{Fr, De, Es, Ca, It, Ru, Zh, Pt, Fa, Et, Mn, Nl, Tr, Ar, Sv, Lv, Sl, Ta, Ja, Id, Cy} -> En and En -> {De, Ca, Zh, Fa, Et, Mn, Tr, Ar, Sv, Lv, Sl, Ta, Ja, Id, Cy}	Text	2880h	CC0
CVSS	{Fr, De, Es, Ca, It, Ru, Zh, Pt, Fa, Et, Mn, Nl, Tr, Ar, Sv, Lv, Sl, Ta, Ja, Id, Cy} -> En	Text & Speech	1900h	CC BY 4.0
mTEDx	{Es, Fr, Pt, It, Ru, El} -> En, {Fr, Pt, It} -> Es, Es -> {Fr, It}, {Es,Fr} -> Pt	Text	765h	CC BY-NC-ND 4.0
CoVoST	{Fr, De, Nl, Ru, Es, It, Tr, Fa, Sv, Mn, Zh} -> En	Text	700h	CC0
MUST-C & MUST-Cinema	En -> {De, Es, Fr, It, Nl, Pt, Ro, Ru, Ar, Cs, Fa, Tr, Vi, Zh}	Text	504h	CC BY-NC-ND 4.0
How2	En -> Pt	Text	300h	Youtube & CC BY-SA 4.0
Augmented LibriSpeech	En -> Fr	Text	236h	CC BY 4.0
Europarl-ST	{En, Fr, De, Es, It, Pt, Pl, Ro, Nl} -> {En, Fr, De, Es, It, Pt, Pl, Ro, Nl}	Text	280h	CC BY-NC 4.0
Kosp2e	Ko -> En	Text	198h	Mixed CC
Fisher + Callhome	Es -> En	Text	160h+20h	LDC
MaSS	parallel among En, Es, Eu, Fi, Fr, Hu, Ro and Ru	Text & Speech	172h	Bible.is
LibriVoxDeEn	De -> En	Text	110h	CC BY-NC-SA 4.0
Prabhupadavani	parallel among En, Fr, De, Gu, Hi, Hu, Id, It, Lv, Lt, Ne, Fa, Pl, Pt, Ru, Sl, Sk, Es, Se, Ta, Te, Tr, Bg, Hr, Da and Nl	Text	94h
BSTC	Zh -> En	Text	68h
LibriS2S	De <-> En	Text & Speech	52h/57h	CC BY-NC-SA 4.0

Toolkit

Paper

2023

[arXiv] Tuning Large language model for End-to-end Speech Translation
[arXiv] Improving Speech Translation by Cross-Modal Multi-Grained Contrastive Learning
[arXiv] Multilingual Speech-to-Speech Translation into Multiple Target Languages
[ICCV] MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition
[INTERSPEECH] MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
[INTERSPEECH] Modular Speech-to-Text Translation for Zero-Shot Cross-Modal Transfer
[INTERSPEECH] Joint Speech Translation and Named Entity Recognition
[INTERSPEECH] StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation
[INTERSPEECH] Knowledge Distillation on Joint Task End-to-End Speech Translation
[INTERSPEECH] GigaST: A 10,000-hour Pseudo Speech Translation Corpus
[INTERSPEECH] Inter-connection: Effective Connection between Pre-trained Encoder and Decoder for Speech Translation
[INTERSPEECH] AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation
[INTERSPEECH] Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
[INTERSPEECH] HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation
[INTERSPEECH] Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models
[ICML] Pre-training for Speech Translation: CTC Meets Optimal Transport
[ACL] UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units
[ACL] Simple and effective unsupervised speech translation
[ACL] BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric
[ACL] SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations
[ACL] Understanding and Bridging the Modality Gap for Speech Translation
[ACL] Back Translation for Speech-to-text Translation Without Transcripts
[ACL] AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation
[ACL] WACO: Word-Aligned Contrastive Learning for Speech Translation
[ACL] Attention as a guide for Simultaneous Speech Translation
[ACL Findings] Speech-to-Speech Translation for a Real-world Unwritten Language
[ACL Findings] CKDST: Comprehensively and Effectively Distill Knowledge from Machine Translation to End-to-End Speech Translation
[ACL Findings] Duplex Diffusion Models Improve Speech-to-Speech Translation
[ACL Findings] DUB: Discrete Unit Back-translation for Speech Translation
[ACL Findings] End-to-End Simultaneous Speech Translation with Differentiable Segmentation
[ACL Findings] Implicit Memory Transformer for Computationally Efficient Simultaneous Speech Translation
[ACL Findings] Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data
[ICASSP] Textless Direct Speech-to-Speech Translation with Discrete Speech Representation
[ICASSP] M3ST: Mix at Three Levels for Speech Translation
[EACL Findings] Generating Synthetic Speech from SpokenVocab for Speech Translation
[AAAI] Improving End-to-end Speech Translation by Leveraging Auxiliary Speech and Text Data

2022

2021

2020

2019

Contact

Changhan Wang (wangchanghan@gmail.com)

Awesome

End-to-End Speech Translation Progress

Tutorial

Data

Toolkit

Paper

2023

2022

2021

2020

2019

2018

2017

2016

2013

Contact