Home

Awesome

LIBCP -- A Library for Conformal Prediction

LibCP is a simple, easy-to-use, and efficient software for Conformal Prediction on classification, which gives prediction together with confidence and credibility. It solves conformal prediction in both online and batch mode with k-nearest neighbors as the underlying algorithm. This document explains the use of LibCP.

Table of Contents

Installation and Data Format

On Unix systems, type make to build the cp-offline, cp-online and cp-cv programs. Run them without arguments to show the usage of them.

The format of training and testing data file is:

<label> <index1>:<value1> <index2>:<value2> ...
...
...
...

Each line contains an instance and is ended by a '\n' character (Unix line ending). For classification, <label> is an integer indicating the class label (multi-class is supported). For regression, <label> is the target value which can be any real number. The pair <index>:<value> gives a feature (attribute) value: <index> is an integer starting from 1 and <value> is the value of the attribute, which could be an integer number or real number. Indices must be in ASCENDING order. Labels in the testing file are only used to calculate accuracies and errors. If they are unknown, just fill the first column with any numbers.

A sample classification data set included in this package is iris_scale for training and iris_scale_t for testing.

Type cp-offline iris_scale iris_scale_t, and the program will read the training data and testing data and then output the result into iris_scale_t_output file by default. The model file iris_scale_model will not be saved by default, however, adding -s model_file_name to [option] will save the model to model_file_name. The output file contains the predicted labels and the lower and upper bounds of probabilities for each predicted label.

"cp-offline" Usage

Usage: cp-offline [options] train_file test_file [output_file]
options:
  -t non-conformity measure : set type of NCM (default 0)
    0 -- k-nearest neighbors (KNN)
  -k num_neighbors : set number of neighbors in kNN (default 1)
  -s model_file_name : save model
  -l model_file_name : load model
  -e epsilon : set significance level (default 0.05)

train_file is the data you want to train with.
test_file is the data you want to predict.
cp-offline will produce outputs in the output_file by default.

"cp-online" Usage

Usage: cp-online [options] data_file [output_file]
options:
  -t non-conformity measure : set type of NCM (default 0)
    0 -- k-nearest neighbors (KNN)
  -k num_neighbors : set number of neighbors in kNN (default 1)
  -e epsilon : set significance level (default 0.05)

data_file is the data you want to run the online prediction on.
cp-online will produce outputs in the output_file by default.

"cp-cv" Usage

Usage: cp-cv [options] data_file [output_file]
options:
  -t non-conformity measure : set type of NCM (default 0)
    0 -- k-nearest neighbors (KNN)
  -k num_neighbors : set number of neighbors in kNN (default 1)
  -v num_folds : set number of folders in cross validation (default 5)
  -e epsilon : set significance level (default 0.05)

data_file is the data you want to run the cross validation on.
cp-cv will produce outputs in the output_file by default.

Tips on Practical Use

Examples

> cp-offline -k 3 train_file test_file output_file

Train a conformal predictor with 3-nearest neighbors as non-conformity measure from train_file. Then conduct this classifier to test_file and output the results to output_file.

> cp-online data_file

Train an online conformal predictor classifier using nearest neighbour as non-conformity measure from data_file. Then output the results to the default output file.

> cp-cv -v 10 data_file

Do a 10-fold cross validation conformal predictor using nearest neighbour as non-conformity measure from data_file. Then output the results to the default output file.

Library Usage

All functions and structures are declared in different header files. There are 4 parts in this library, which are utilities, knn, cp and the other driver programs.

utilities.h and utilities.cpp

The structure Problem for storing the data sets (including the structure Node for storing the attributes pair of index and value) and all the constant variables are declared in utilities.h.

In this file, some utilizable function templates or functions are also declared.

knn.h and knn.cpp

The structure KNNParameter for storing the kNN related parameters and the structure KNNModel for storing the kNN related model are declared in knn.h.

In this file, some utilizable function templates or functions are also declared.

cp.h and cp.cpp

The structure Parameter for storing the Conformal Prediction related parameters and the structure Model for storing the Conformal Prediction related model are declared in cp.h. You need to #include "cp.h" in your C/C++ source files and link your program with cp.cpp. You can see cp-offline.cpp, cp-online.cpp and cp-cv.cpp for examples showing how to use them.

In this file, some utilizable function templates or functions are also declared.

cp-offline.cpp, cp-online.cpp and cp-cv.cpp

These three files are the driver programs for LibCP. cp-offline.cpp is for training and testing data sets in offline setting. cp-online.cpp is for doing online prediction on data sets. cp-cv.cpp is for doing cross validation on data sets.

The structure of these files are similar. In these programs, the command-line inputs will be parsed, the data sets will be read into the memory, the train and predict process will be called, the performance measure process will be carried out and finally the memories it claimed will be cleaned up. It includes the following functions.

Additional Information

For any questions and comments, please email c.zhou@cs.rhul.ac.uk

Acknowledgments

Special thanks to Chih-Chung Chang and Chih-Jen Lin, which are the authors of LibSVM.