Awesome

Caffe2 C++ Tutorials and Examples

C++ transcripts of the Caffe2 Python tutorials and other C++ example code.

About

Caffe2 has a strong C++ core but most tutorials only cover the outer Python layer of the framework. This project aims to provide example code written in C++, complementary to the Python documentation and tutorials. It covers verbatim transcriptions of most of the Python tutorials and other example applications.

Some higher level tools, like brewing models and adding gradient operations are currently not available in Caffe2's C++. This repo therefore provides some model helpers and other utilities as replacements, which are probably just as helpful as the actual tutorials. You can find them in include/caffe2/util and src/caffe2/util.

Check out the original Caffe2 Python tutorials at https://caffe2.ai/docs/tutorials.html.

Build

Install dependencies

Install the dependencies CMake, leveldb and OpenCV. If you're on macOS, use Homebrew:

 brew install cmake glog protobuf leveldb opencv eigen

On Ubuntu:

 apt-get install cmake libgoogle-glog-dev libprotobuf-dev libleveldb-dev libopencv-dev libeigen3-dev curl

In case you're using CUDA an run into CMake issues with NCCL, try adding this to your .bashrc (assuming Caffe2 at $HOME/caffe2):

 export CMAKE_LIBRARY_PATH=$CMAKE_LIBRARY_PATH:$HOME/caffe2/third_party/nccl/build/lib

Install Caffe2

Follow the Caffe2 installation instructions: https://caffe2.ai/docs/getting-started.html
Build using CMake

This project uses CMake. However easiest way to just build the whole thing is:
```
 make
```
Internally it creates a build folder and runs CMake from there. This also downloads the resources that are required for running some of the tutorials.

Check out the Build alternatives section below if you wish to be more involved in the build process.

Note: sources are developed and tested on macOS and Ubuntu.

Intro Tutorial

The Intro Tutorial covers the basic building blocks of Caffe2. This tutorial is transcribed in intro.cc.

Make sure to first run make. Then run the intro tutorial:

./bin/intro

This should output some numbers, including a loss of about 2.2.

Toy Regression

One of the most basic machine learning tasks is linear regression (LR). The Toy Regression tutorial shows how to get accurate results with a two-parameter model. This tutorial is transcribed in toy.cc

Run the toy regression model:

./bin/toy

This performs 100 steps of training, which should result in W after approximating W ground truth.

Loading Pre-Trained Models

Often training can be skipped by using a pre-trained model. The Model Zoo contains a few of the popular models, although many are only available for Caffe. Use caffe_translator.py to convert models to Caffe2. See Caffe2 Models for more info.

The Loading Pre-Trained Models tutorial shows how to use these models to classify images. This tutorial and more is covered in pretrained.cc. The code takes an input image and classifies its content. By default it uses the image in res/image_file.jpg. Make sure the pre-trained Squeezenet model is present in res/squeezenet_*_net.pb. Note that

To run:

./bin/pretrained

This should output something along the lines of 96% 'daisy'.

To classify giraffe.jpg:

./bin/pretrained --file giraffe.jpg

This tutorial is also a good test to see if OpenCV is working properly.

To export a model from Python:

model = model_helper.ModelHelper(..)
with open("init_net.pb", 'wb') as f:
  f.write(model.param_init_net._net.SerializeToString())
with open("predict_net.pb", 'wb') as f:
  f.write(model.net._net.SerializeToString())

See also:

Image Pre-Processing

MNIST - Create a CNN from Scratch

A classical machine learning dataset is the MNIST database of handwritten digits by Yann LeCun. The Caffe2 tutorial MNIST - Create a CNN from Scratch shows how to build a basic convolutional neural network (CNN) to recognize these handwritten digits. This tutorial is transcribed in mnist.cc. Note that this and following tutorials rely on utility functions defined in caffe2/util.

Make sure the databases folders res/mnist-*-nchw-leveldb are present. These should be generated by the download_resource.sh script. Then run:

./bin/mnist

This performs 100 training runs, which should provide about 90% accuracy.

To see the training in action, run with --display:

./bin/mnist --display

After testing the trained model is stored in tmp/mnist_init_net.pb and tmp/mnist_predict_net.pb. For an example implentation of how to use this trained model, take a look at predict_example() in mnist.cc. This implementation is "pure" Caffe2 and does not rely on any helper methods. For an implementation that does use helper methods, take a look at imagenet.

RNNs and LSTM Networks

In The Unreasonable Effectiveness of Recurrent Neural Networks Andrej Karpathy describes how to train a recurrent neural network (RNN) on a large volume of text and how to generate new text using such a network. The Caffe2 tutorial RNNs and LSTM Networks covers this technique using the char_rnn.py script.

This tutorial is transcribed in rnn.cc. It takes the same parameters as used in the tutorial. First make sure the file res/shakespeare.txt is present. Then run:

./bin/rnn

In contrast to the tutorial, this script terminates after 10K iterations. To get more, use --iters:

./bin/run --iters 100000

To get better results (loss < 1), expand the hidden layer:

./bin/rnn --iters 100000 --batch 32 --hidden_size 512 --seq_length 32

The file res/dickens.txt contains a larger volume of text. Because the writing is a bit more recent, it's more challenging to generate convincing results. Also, single newlines are stripped to allow for more creativity.

./bin/rnn --iters 100000 --batch 32 --hidden_size 768 --seq_length 32 --train_data res/dickens.txt

After 200K runs, the loss has not dropped below 36, in contrast to the shakespeare text. Perhaps this requires an additional hidden layer in the LSTM model.

ImageNet Classifiers

Much of the progress in image recognition is published after the yearly ImageNet Large Scale Visual Recognition Challenge (ILSVRC). This competion is based on the ImageNet dataset, which is a large volume of labeled images of nearly everything. The models for this challenge form the basis of much image recognition and processing research. One of the most basic challenges is classifying an image, which is covered in this example.

To classify the content of an image, run:

./bin/imagenet --model resnet101 --file res/image_file.jpg

Where the model name is one of the following:

alexnet: AlexNet
googlenet: GoogleNet
squeezenet: SqueezeNet
vgg16 and vgg19: VGG Team
resnet50, resnet101, resnet152: MSRA
mobilenet, mobilenet50, mobilenet25: MobileNet

The pre-trained weights for these models are automatically downloaded and stored in the res/ folder. If you wish to download all models in one go, run:

./script/download_extra.sh

Additional models can be made available on request!

To classify an image using a model that you trained yourself, specify the location of the init and predict .pb file including a % character. For example:

./bin/imagenet --model res/mobilenet_%_net.pb --file res/image_file.jpg

See also:

Fast Retrain

The article DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition describes how to get good results on new datasets with minimal training efforts by reusing trained parameters of an existing model. For example, the above models are all trained on ImageNet data, which means they will only be able to classify ImageNet labels. However, by retraining just the top half of the model we can get high accuracy in a fraction of the time. If the image data has similar characteristics, it's possible to get good results by only retraining the top 'FC' layer.

First divide all images in subfolders with the label a folder name. Then to retrain the final layer of GoogleNet:

./bin/train --model googlenet --folder res/images --layer pool5/7x7_s1

The script starts out by collecting all images and running them through the pre-trained part of the model. This allows for very fast training on the pre-processed image data.

If you have more (GPU) power at your disposal retrain VGG16's final 2 layers:

./bin/train --model vgg16 --folder res/images --layer fc6

Add --display for training visualization:

./bin/train --model googlenet --folder res/images --layer pool5/7x7_s1 --display

Some models, like SqueezeNet require reshaping of their output to N x D tensor:

./bin/train --model squeezenet --folder res/images --layer fire9/concat --reshape

You can also provide your own pre-trained model. Specify the location of the init and predict .pb file including a % character:

./bin/train --model res/googlenet_%_net.pb --folder res/images --layer pool5/7x7_s1

After the test runs, the model is saved in the --folder under the name _<layer>_<model>_<init/predict>_net.pb. Please note that you'll need to specify the generated class .txt file by using the --classes flag. You can now use this model like any other, for example in the imagenet example:

./bin/imagenet --model res/images/_pool5_7x7_s1_googlenet_%_net.pb --file res/images/dog/Tjoise.jpg --classes res/images/_pool5_7x7_s1_classes.txt

Another example implementation of a classifier can be found in mnist.cc, see predict_example().

See also:

Training from scratch

To fully train an existing image classification model from scratch, run without the --layer option:

./bin/train --model resnet50 --folder res/images

The models currently available for training are the ones listed in the ImageNet section. This will take a lot of time even when runnning on the GPU.

Some models, like SqueezeNet require reshaping of their output to N x D tensor:

./bin/train --model squeezenet --folder res/images --reshape

Deep Dream

The article Inceptionism: Going Deeper into Neural Networks describes how hidden layers of a CNN can be visualized by training just the the input tensor based on the mean value of a particular channel. This technique is known for producing remarkable imagery and can be combined with existing images. Ths is referred to as Deep Dream.

NB: this code is still a bit buggy and generated images are not representative of the original Deep Dream implementation.

The 139th channel in the inception_4d/3x3 layer in GoogleNet:

./bin/dream --model googlenet --layer inception_4d/3x3_reduce --channel 139

The resulting image will be written to tmp/. To visualize the process, add --display:

./bin/dream --model googlenet --layer inception_4d/3x3_reduce --channel 139 --display

NB: these images differ from the ones presented in Google's article.

Multiple channels can be rendered in parallel by increasing the batch size:

./bin/dream --model googlenet --layer inception_4d/3x3_reduce --channel 133 --display --batch 11

If you have more (GPU) power at your disposal, the first channel in conv3_1 layer in VGG16:

./bin/dream --model vgg16 --layer conv3_1 --channel 0

You can also provide your own pre-trained model. Specify the location of the init and predict .pb file including a % character:

./bin/dream --model res/squeezenet_%_net.pb --layer fire9/concat --channel 100 --display

We can also do some dreaming on MNIST. First train the MNIST model. Then run:

./bin/dream --model tmp/mnist_%_net.pb --layer conv2 --size 28 --channel 0 --batch 50 --display

See also:

Plots

Some of the examples have a --display option, which will show an OpenCV window with images and plots covering the training progress. These graphs are drawn using the cvplot framework.

Troubleshooting

See http://rpg.ifi.uzh.ch/docs/glog.html for more info on logging. Try running the tools and examples with --logtostderr=1, --caffe2_log_level=1, and --v=1.

Build alternatives

The easiest way to build all sources is to run:

make

To run these steps manually:

mkdir -p build
cd build
cmake ..
make
cd ..
./script/download_resource.sh

Compiling the tutorials and examples individually can be a little more involved. One way to get more understanding of what CMake does internally is by running:

cd build
make VERBOSE=1
cd ..

The first three tutorials intro, toy, and pretrained can be compiled without CMake quite easily. For example pretrained on macOS:

c++ src/caffe2/binaries/pretrained.cc -o bin/pretrained -std=gnu++11 -Iinclude -I/usr/local/include/eigen3 -I/usr/local/include/opencv -lgflags -lglog -lprotobuf -lopencv_core -lopencv_imgproc -lopencv_highgui -lopencv_imgcodecs -lCaffe2_CPU

And pretrained on Ubuntu:

c++ src/caffe2/binaries/pretrained.cc -o bin/pretrained -std=gnu++11 -Iinclude -I/usr/include/eigen3 -I/usr/include/opencv -lgflags -lglog -lprotobuf -lopencv_core -lopencv_imgproc -lopencv_highgui -lCaffe2_CPU

Other examples require the compilation of additional .cc files. Take a look at the verbose output of cd build && make VERBOSE=1.