Home

Awesome

Neural Japanese Transliteration—can you do better than SwiftKey™ Keyboard?

Abstract

In this project, I examine how well neural networks can convert Roman letters into the Japanese script, i.e., Hiragana, Katakana, or Kanji. The accuracy evaluation results for 896 Japanese test sentences outperform the SwiftKey™ keyboard, a well-known smartphone multilingual keyboard, by a small margin. It seems that neural networks can learn this task easily and quickly.

Requirements

Background

<img src="images/swiftkey_ja.gif" width="200" align="right">

Problem Formulation

I frame the problem as a seq2seq task.

Inputs: nihongo。<br> Outputs: 日本語。

Data

Model Architecture

I adopted the encoder and the first decoder architecture of Tacotron, a speech synthesis model.

Contents

Training

Testing

Results

The training curve looks like this.

<img src="images/training_curve.png">

The evaluation metric is CER (Character Error Rate). Its formula is

The following is the results after 13 epochs, or 79,898 global steps. Details are available in results/*.csv.

Proposed (Greedy decoding)Proposed (Beam decoding)SwiftKey 6.4.8.57
1595/12057=0.1321517/12057=0.1251640/12057=0.136