Awesome
charNMT-noise
Scripts and noise data for Belinkov & Bisk Synthetic and Natural Noise Both Break Neural Machine Translation ICLR 2018
MT Data
The experiments reported in the paper are conducted on the TED talks corpus prepared for IWSLT 2016, which is available on the WIT<sup>3</sup> website.
Pretrained Models
Nematus: http://data.statmt.org/rsennrich/wmt16_systems/
char2char: https://github.com/nyu-dl/dl4mt-c2c
Sources of Natural Noise
French:
Aurlien Max and Guillaume Wisniewski. Mining Naturally-occurring Corrections and Paraphrases from Wikipedias Revision History LREC 2010 corpus
German:
Katrin Wisniewski et al. MERLIN: an online trilingual learner corpus empirically grounding the European Reference Levels in authentic learner data 2013 corpus1 corpus2
Czech:
Karel Sebesta et al. CzeSL grammatical error correction dataset (CZeSL-GEC) Tech Report LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics, Charles University 2017 corpus