Home

Awesome

Fast Bayesian nonnegative matrix factorisation and tri-factorisation

This project contains an implementation of the non-negative matrix factorisation and tri-factorisation models presented in the paper Fast Bayesian nonnegative matrix factorisation and tri-factorisation, accepted for the NIPS 2016 Workshop on Advances in Approximate Bayesian Inference. For both models we implement four different inference methods: Gibbs sampling, variational Bayesian inference, iterated conditional modes, and non-probabilistic inference. We furthermore provide all datasets used (including the preprocessing scripts), and Python scripts for experiments.

An extended version of this project, with more experiments and automatic model selection (using automatic relevance determination) can be found here; paper here.

<img src="./images/mf_mtf.png" width="100%"/>

Paper abstract

We present a fast variational Bayesian algorithm for performing non-negative matrix factorisation and tri-factorisation. We show that our approach achieves faster convergence per iteration and timestep (wall-clock) than Gibbs sampling and non-probabilistic approaches, and do not require additional samples to estimate the posterior. We show that in particular for matrix tri-factorisation convergence is difficult, but our variational Bayesian approach offers a fast solution, allowing the tri-factorisation approach to be used more effectively.

Authors

Thomas Brouwer, Jes Frellsen, Pietro Lio'. Contact: thomas.a.brouwer@gmail.com.

Installation

If you wish to use the matrix factorisation models, or replicate the experiments, follow these steps. Please ensure you have Python 2.7 (3 is currently not supported).

  1. Clone the project to your computer, by running git clone https://github.com/ThomasBrouwer/BNMTF.git in your command line.

  2. In your Python script, add the project to your system path using the following lines.

    project_location = "/path/to/folder/containing/project/"
    import sys
    sys.path.append(project_location) 
    

    For example, if the path to the project is /johndoe/projects/BNMTF/, use project_location = /johndoe/projects/. If you intend to rerun some of the paper's experiments, those scripts automatically add the correct path.

  3. You can now import the models in your code, e.g.

from BNMTF.code.models.nmf_np import NMF
model = NMF(R=numpy.ones((4,3)), M=numpy.ones((4,3)), K=2)
model.initialise()
model.run(iterations=10)

Examples

You can find good examples of the models running on data in the convergence experiment on the toy data, e.g. nonnegative matrix factorisation with Gibbs sampling.

Citation

If this project was useful for your research, please consider citing the extended paper,

Thomas Brouwer, Jes Frellsen, and Pietro Lió (2017). Comparative Study of Inference Methods for Bayesian Nonnegative Matrix Factorisation. Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2017).

@inproceedings{Brouwer2017b,
	author = {Brouwer, Thomas and Frellsen, Jes and Li\'{o}, Pietro},
	booktitle = {Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD)},
	title = {{Comparative Study of Inference Methods for Bayesian Nonnegative Matrix Factorisation}},
	year = {2017}
}

Project structure

<details> <summary>Click here to find a description of the different folders and files available in this repository.</summary> <br>

/code/

Python code, for the models, cross-validation methods, and model selection.

/models/: Python classes for the BNMF and BNMTF models: Gibbs sampling, Variational Bayes, Iterated Conditional Modes, and non-probabilistic versions.

/grid_search/: Classes for doing model selection on the Bayesian NMF and NMTF models, and for doing cross-validation with model selection. We can minimise or maximise the MSE, ELBO, AIC, BIC, log likelihood.

/data_toy/

Contains the toy data, and methods for generating toy data.

/data_drug_sensitivity/

Contains the drug sensitivity datasets (GDSC IC50, CCLE IC50, CCLE EC50, CTRP EC50).

/experiments/

/plots/

The results and plots for the experiments are stored in this folder, along with scripts for making the plots.

/tests/

py.test unit tests for the code and classes in /code/. To run the tests, simply cd into the /tests/ folder, and run pytest in the command line.

/images/

The images at the top of this README.

</br> </details>