Home

Awesome

GiniClust

GiniClust is a clustering method implemented in Python and R for detecting rare cell-types from large-scale single-cell gene expression data.

GiniClust can be applied to datasets originating from different platforms, such as multiplex qPCR data, traditional single-cell RNAseq or newly emerging UMI-based single-cell RNAseq, e.g. inDrops and Drop-seq.

GiniClust is created and maintained by the GC Yuan Lab at Harvard University and the Dana-Farber Cancer Institute and comes with a graphical user interface for convenience:

alt tag

Installation

Please ensure that you have Python 2.7 in your environment. The graphical user interface of GiniClust relies on wxPython, a Python wrapper for the cross-platform wxWidgets API. Instructions on how to install wxPython are available on the corresponding website. On Fedora Linux, the following at the command-line interface worked just fine:

$ sudo dnf install wxPython.

In addition, GiniClust relies on the following libraries:

Those packages should be automatically installed or upgraded via a pip installation. For instance, to install Gooey, proceed as follows:

If in doubt, please check that those libraries got installed properly by trying to import them or some of their modules in your Python interpreter: >>> import gooey, pkg_resources.

As for the R code at the core of much of GiniClust`s computations, for MAC and WINDOWS only the official R installation file is supported and tested. Using other installation methods, such as brew, may lead to running error.

Besides, some users might experience issues installing another of GiniClust's dependencies: the MAST R package. If this happens, please visit the MAST website (https://github.com/RGLab/MAST) for detailed instructions. We recommend that users upgrade MAST package to the newest version. If you are using an old version, you may need to replace the file DE_MAST.R in 'Rfunction' by DE_MAST.R in 'Archive'.

Input file format

The input file is a gene expression matrix in comma-separated value (csv) format.

Specifically, for qPCR data, each row is log2 gene expression level; for RNAseq data, each row is UMI-Count/Cell or Raw-Read-Count/Cell (Note: log2 transformed RNA-seq data for Giniclust may not work! We suggest that user use featureCounts from http://subread.sourceforge.net/ or htseq-count from http://www-huber.embl.de/users/anders/HTSeq/doc/counting.html to get raw reads counts ). The first row contains cell IDs. The first column contains unique gene names.

For example, in R

>ExprM.RawCounts  <- read.csv("Data_GBM.csv", sep=",", head=T)
>ExprM.RawCounts[1:4,1:4]

you can take a look at one of our test datasets (stored in the sample_data folder within GiniClust's repository):

TableMGH26MGH26.1MGH26.2MGH26.3
1/2-SBSRNA404700
A1BG418030
A1BG-AS10000
A1CF0000

Usage

To run GiniClust, please download the GiniClust GitHub repository, unzip it and move to the extracted directory so that it becomes your current working directory.

Then, in a Linux environment, proceed as follows:

From an OS X or Windows environment, proceed as follows:

A graphical user interface will spring up and direct you into choosing a file to process from your arborescence of directories, specify the type of data at hand (qPCR or RNA-seq), along with the name of the folder where you would like to store GiniClust's output (see the section below for more information about those files). A screenshot is provided herewith:

alt tag

Alternatively, GiniClust can be run directly as an R script at the command-line interface:

$ Rscript Giniclust_Main.R [options]

You can specify the following options:

For example, the following command is used to analyze the 'Data_GBM.csv' dataset

$ Rscript GiniClust_Main.R -f Data_GBM.csv -t RNA-seq -o GBM_results

The following command is used to analyze the 'Data_qPCR.csv' dataset.

$ Rscript GiniClust_Main.R -f Data_qPCR.csv -t qPCR -o  qPCR_results

Results

The output directory specified by the user at the graphical user interface will contain the following files and directories:

Furthermore, a folder named 'Library' will be created, which includes a wealth of newly installed packages.

Reference

The GiniClust software was developped in support of a research project conducted at the GC Yuan Lab (Harvard University & DFCI). If you find it useful to your own investigations, please cite the following publication:

Jiang L, Chen H, Pinello L, Yuan GC. GiniClust: Detecting rare cell types from single-cell gene expression data with Gini Index. Genome Biology (2016) 17:144 DOI: 10.1186/s13059-016-1010-4

Credits

Lan Jiang (lan_jiang at hms dot harvard dot edu), the main developer of GiniClust, wrote the R scripts and started the README file. Gregory Giecold (ggiecold at jimmy dot harvard dot edu) developed the graphical user interface, reorganized the R packaging and edited the README file. Huidong Chen (hdchen at jimmy dot harvard dot edu) wrote the R command-line interface and contributed to the graphical user interface. Qian Zhu (qzhu at princeton dot edu) contributed to the graphical user interface. We would like to give special thanks to Luca Pinello who introduced the Gini-index and advised on the development and implementation of the GiniClust software.

Maintainers: Lan Jiang (lan_jiang at hms dot harvard dot edu), Huidong Chen (hdchen at jimmy do harvard dot edu) and Qian Zhu (qzhu at princeton dot edu).

For more information on the biological motivations underlying this project, please contact Lan Jiang (lan_jiang at hms dot harvard dot edu) or Guo-Cheng Yuan (gcyan at jimmy dot harvard dot edu).

License

Copyright 2016-2021 Lan Jiang and contributors.

GiniClust is free software made available under the MIT License. For details see the LICENSE file.