Home

Awesome

Character Based CNN

MIT contributions welcome Twitter Stars

This repo contains a PyTorch implementation of a character-level convolutional neural network for text classification.

The model architecture comes from this paper: https://arxiv.org/pdf/1509.01626.pdf

Network architecture

There are two variants: a large and a small. You can switch between the two by changing the configuration file.

This architecture has 6 convolutional layers:

LayerLarge FeatureSmall FeatureKernelPool
1102425673
2102425673
310242563N/A
410242563N/A
510242563N/A
6102425633

and 2 fully connected layers:

LayerOutput Units LargeOutput Units Small
720481024
820481024
9Depends on the problemDepends on the problem

Video tutorial

If you're interested in how character CNN work as well as in the demo of this project you can check my youtube video tutorial.

<p align="center"> <a href="https://www.youtube.com/watch?v=CNY8VjJt-iQ"> <img src="https://img.youtube.com/vi/CNY8VjJt-iQ/0.jpg"> </a> </p>

Why you should care about character level CNNs

They have very nice properties:

Training a sentiment classifier on french customer reviews

I have tested this model on a set of french labeled customer reviews (of over 3 millions rows). I reported the metrics in TensorboardX.

I got the following results

F1 scoreAccuracy
train0.9650.9366
test0.9450.915

Training metrics

Dependencies

Structure of the code

At the root of the project, you will have:

How to use the code

Training

The code currently works only on binary labels (0/1)

Launch train.py with the following arguments:

Example usage:

python train.py --data_path=/data/tweets.csv --max_rows=200000

Plotting results to TensorboardX

Run this command at the root of the project:

tensorboard --logdir=./logs/ --port=6006

Then go to: http://localhost:6006 (or whatever host you're using)

Prediction

Launch predict.py with the following arguments:

Example usage:

python predict.py ./models/pretrained_model.pth --text="I love pizza !" --max_length=150

Download pretrained models

Contributions - PR are welcome:

Here's a non-exhaustive list of potential future features to add:

License

This project is licensed under the MIT License