Home

Awesome

Neural Networks for Data Selection

This repository contains the code for the paper "Neural Networks Classifier for Data Selection in Statistical Machine Translation"

Built upon our fork of Keras (version 1.2) and tested for the Theano backend.

Features

Installation

Provided that you have pip installed, run:

git clone https://github.com/lvapeab/sentence-selectioNN
cd sentence-selectioNN
pip install -r requirements.txt

for obtaining the required packages for running this library.

sentence-selectioNN requires the following libraries:

Instructions:

Assuming you have a corpus:

  1. Check out the inputs/outputs of your model in data_engine/prepare_data.py

  2. If you want to use pretrained word vectors, use the preprocessing scripts for binary or text for pretrained Glove or Word2Vec vectors.

  3. Set a model configuration in config.py

  4. Train!:

python main.py

Architecture

We support two different network architecture, BLSTM or CNN, both at monolingual or bilingual level.

NN_Classifier

Please, see the paper for a more detailed description of the model.

Citation

If you use this code for any purpose, please cite the following paper:

Peris Á., Chinea-Rios M., Casacuberta F. 
Neural Networks Classifier for Data Selection in Statistical Machine Translation. 
In  The Prague Bulletin of Mathematical Linguistics No. 108, pp. 283–294. 2017.

Contact

Álvaro Peris (web page): lvapeab@prhlt.upv.es