Home

Awesome

Machine Translation Reading List

This is a machine translation reading list maintained by the Tsinghua Natural Language Processing Group.

The past three decades have witnessed the rapid development of machine translation, especially for data-driven approaches such as statistical machine translation (SMT) and neural machine translation (NMT). Due to the dominance of NMT at the present time, priority is given to collecting important, up-to-date NMT papers; the Edinburgh/JHU MT research survey wiki has good coverage of older papers and a brief description for each sub-topic of MT. Our list is still incomplete and the categorization might be inappropriate. We will keep adding papers and improving the list. Any suggestions are welcome!

<h2 id="10_must_reads">10 Must Reads</h2> <h2 id="surveys">Tutorials and Surveys</h2> <h2 id="statistical_machine_translation">Statistical Machine Translation</h2> <h3 id="word_based_models">Word-based Models</h3> <h3 id="phrase_based_models">Phrase-based Models</h3> <h3 id="syntax_based_models">Syntax-based Models</h3> <h3 id="discriminative_training">Discriminative Training</h3> <h3 id="system_combination">System Combination</h3> <h3 id="human_centered_smt">Human-centered SMT</h3> <h4 id="interactive">Interactive SMT</h4> <h4 id="adaptation_smt">Adaptation</h4> <h2 id="evaluation">Evaluation</h2> <h2 id="neural_machine_translation">Neural Machine Translation</h2> <h3 id="model_architecture">Model Architecture</h3> <h3 id="attention_mechanism">Attention Mechanism</h3> <h3 id="open_vocabulary">Open Vocabulary</h3> <h3 id="training">Training Objectives and Frameworks</h3> <h3 id="decoding">Decoding</h3> <h3 id="low_resource_language_translation">Low-resource Language Translation</h3> <h4 id="semi_supervised">Semi-supervised Learning</h4> <h4 id="unsupervised">Unsupervised Learning</h4> <h4 id="pivot_based">Pivot-based Methods</h4> <h4 id="data_augmentation">Data Augmentation Methods</h4> <h4 id="data_selection">Data Selection Methods</h4> <h4 id="transfer_learning">Transfer Learning</h4> <h4 id="meta_learning">Meta Learning</h4> <h3 id="multilingual">Multilingual Machine Translation</h3> <h3 id="prior_knowledge_integration">Prior Knowledge Integration</h3> <h4 id="word_phrase_constraints"> Word/Phrase Constraints </h4> <h4 id="syntactic_semantic_constraints"> Syntactic/Semantic Constraints </h4> <h4 id="coverage_constraints">Coverage Constraints</h4> <h3 id="document_level_translation">Document-level Translation</h3> <h3 id="robustness">Robustness</h3> <h3 id="interpretability">Interpretability</h3> <h3 id="linguistic_interpretation">Linguistic Interpretation</h3> <h3 id="fairness_and_diversity">Fairness and Diversity</h3> <h3 id="efficiency">Efficiency</h3> <h3 id="pre_training">Pre-Training</h3> <h3 id="NAT">Non-Autoregressive Translation</h3> <h3 id="speech_translation_and_simultaneous_translation">Speech Translation and Simultaneous Translation</h3> <h3 id="multi_modality">Multi-modality</h3> <h3 id="ensemble_reranking">Ensemble and Reranking</h3> <h3 id="domain_adaptation">Domain Adaptation</h3> <h3 id="quality_estimation">Quality Estimation</h3> <h3 id="human_centered">Human-centered NMT</h3> <h4 id="interactive_nmt">Interactive NMT</h4> <h4 id="ape">Automatic Post-Editing</h4> <h3 id="poetry_translation">Poetry Translation</h3> <h3 id="eco_friendly">Eco-friendly</h3> <h3 id="compositional_generalization">Compositional Generalization</h3> <h3 id="endangered">Endangered Language Revitalization</h3> <h2 id="word_translation">Word Translation</h2> <h2 id="wmt_winners">WMT Winners</h2>

WMT is the most important annual international competition on machine translation. We collect the competition results on the news translation task since WMT 2016 (the First Conference of Machine Translation) and summarize the techniques used in the systems with the top performance. Currently, we focus on four directions: ZH-EN, EN-ZH, DE-EN, and EN-DE. The summarized algorithms might be incomplete; your suggestions are welcome!

<h3 id="wmt19">WMT 2019</h3> <h3 id="wmt18">WMT 2018</h3> <h3 id="wmt17">WMT 2017</h3> <h3 id="wmt16">WMT 2016</h3>