Home

Awesome

Multi-way Interacting Regression via Factorization Machines

This is a Python 2 implementation of MiFM algorithm for interaction discovery and prediction (M. Yurochkin, X. Nguyen, N. Vasiloglou to appear in NIPS 2017). Code written by Mikhail Yurochkin.

Overview

This is a demonstration of MiFM on Abalone data.

First compile cython code in cython folder. On Ubuntu run:

cython g_c_sampler.pyx
python setup.py build_ext --inplace

It implemets Gibbs sampling updates and prediction function

prediction/predict_f_all.py Python wrapper for Cython code to aggregate MCMC samples for prediction

py_scripts/train_f.py Python wrapper for Cython code to run Gibbs sampling

py_scripts/py_functions.py Gibbs sampling for hyperpriors and initialization

mifm_class.py Implements MiFM class; data preprocessing; posterior analysis of interactions

abalone_example.py downloads Abalone dataset and shows how to use MiFM and extract interactions

Implementation is designed to be used in the interactive mode (e.g. Python IDE like Spyder).

Usage guide

MiFM(K=5, J=50, it=700, lin_model=True, alpha=1., verbose=False, restart=5, restart_iter=50, thr=300, rate=25, ncores=1, use_mape=False)

Parameters:

K: rank of matrix of coefficients V

J: number of interactions (columns) in Z

it: number of Gibbs sampling iterations

lin_model: whether to include linear effects (w_1,...,w_D)

alpha: FFM_alpha parameter. Smaller values encourage deeper interactions

verbose: whether to print intermediate RMSE train scores

restart and restart_iter: how many initializations to try with restart_iter iterations each. Then best initialization based on training RMSE is used for fitting

ncores: how many cores to use for initialization with restarts

use_mape: whether to use AMAPE instead of RMSE to select best initialization

thr: number of MCMC iterations after which samples are collected (i.e. burn-in)

rate: each rate iteration is saved

Methods:

fit(X, y, cat_to_v, v_to_cat)

X: training data after one-hot encoding

y: response

cat_to_v: list of category to value after one-hot encoding (see example with Abalone data)

v_to_cat: dictionary of category to values before one-hot encoding (see example with Abalone data)

Returns list of MCMC samples. Each sample is a list [bias, linear coefficients, V, Z]

predict(self, X)

Note: can only be used on fitted object. Returns predicted values for testing data X using Monte Carlo estimator of the mean response.

score(self, X, y)

Makes predictions and computes RMSE or AMAPE