Awesome
MuVI
A multi-view latent variable model with domain-informed structured sparsity, that integrates noisy domain expertise in terms of feature sets.
Basic usage
The MuVI
class is the main entry point for loading the data and performing the inference:
import numpy as np
import pandas as pd
import anndata as ad
import mudata as md
import muvi
# Load processed input data (missing values are allowed)
# Matrix of dimensions n_samples x n_rna_features
rna_df = pd.read_csv(...)
# Matrix of dimensions n_samples x n_prot_features
prot_df = pd.read_csv(...)
# Load prior feature sets, e.g. gene sets
gene_sets = muvi.fs.from_gmt(...)
# Binary matrix of dimensions n_gene_sets x n_rna_features
gene_sets_mask = gene_sets.to_mask(rna_df.columns)
# Create a MuVI object by passing both input data and prior information
model = muvi.MuVI(
observations={"rna": rna_df, "prot": prot_df},
prior_masks={"rna": gene_sets_mask},
...
device=device,
)
# Alternatively, create a MuVI model from AnnData (single-view)
rna_adata = ad.AnnData(rna_df, dtype=np.float32)
rna_adata.varm['gene_sets_mask'] = gene_sets_mask.T
model = muvi.tl.from_adata(
adata,
prior_mask_key="gene_sets_mask",
...,
device=device
)
# Alternatively, create a MuVI model from MuData (multi-view)
mdata = md.MuData({"rna": rna_adata, "prot": prot_adata})
model = muvi.tl.mdata(
mdata,
prior_mask_key="gene_sets_mask",
...,
device=device
)
# Fit the model for a given number of training epochs
model.fit(batch_size, n_epochs, ...)
# Continue with the downstream analysis (see below)
Submodules
The package consists of three additional submodules for analysing the results post-training:
muvi.tl
provides tools for downstream analysis, e.g.,- compute
muvi.tl.variance_explained
across all factors and views muvi.tl.test
the significance between the prior feature sets and the inferred factors- apply clustering on the latent space such as
muvi.tl.leiden
muvi.tl.save
the model in order tomuvi.tl.load
it at a later point in time
- compute
muvi.pl
works in tandem withmuvi.tl
by providing visualization methods such asmuvi.pl.variance_explained
(see above)- plotting the latent space via
muvi.pl.tsne
,muvi.pl.scatter
ormuvi.pl.stripplot
- investigating factors in terms of their inferred loadings with
muvi.pl.inspect_factor
muvi.fs
serves the data structure and methods for loading, processing and storing the prior information from feature sets
Tutorials
Check out our basic tutorial to get familiar with MuVI
, or jump straight to a single-cell multiome analysis!
R
users can readily export a trained MuVI
model into R
with a single line of code and resume the analysis with the MOFA2
package.
muvi.ext.save_as_hdf5(model, "muvi.hdf5", save_metadata=True)
See this vignette for more details!
Installation
We suggest using conda to manage your environments, and pip to install muvi
as a python package. Follow these steps to get muvi
up and running!
- Create a python environment in
conda
:
conda create -n muvi python=3.10
- Activate freshly created environment:
source activate muvi
- Install
muvi
withpip
:
python3 -m pip install muvi
- Alternatively, install the latest version with
pip
:
python3 -m pip install git+https://github.com/MLO-lab/MuVI.git
Make sure to install a GPU version of PyTorch to significantly speed up the inference.
Citation
If you use MuVI
in your work, please use this BibTeX entry:
Encoding Domain Knowledge in Multi-view Latent Variable Models: A Bayesian Approach with Structured Sparsity
Arber Qoku and Florian Buettner
International Conference on Artificial Intelligence and Statistics (AISTATS) 2023