Awesome
Publication link ———————————————— https://www.nature.com/articles/s41598-017-04070-4
*This file is part of the CCS package for biclustering analysis
Name: Condition-dependent Correlation Subgroup (CCS)
Introduction to CCS
Condition-dependent Correlation Subgroup (CCS) is a biclustering algorithm for comprehensive discovery of functionally coherent biclusters from large-scale gene expression data. For the details of the CCS algorithm, see our paper entitled “A GPU-accelerated algorithm for biclustering analysis and detection of condition-dependent coexpression network modules”. The algorithm is implemented in C. See the steps in the section A and B for compilation and execution of the C code. The structure of the CCS algorithm is particularly suitable for parallel computing. A CUDA C based GPGPU computing code is included here as a parallel version of the algorithm. You need a programmable GPU card and CUDA C complier for compilation and execution of our code. Follow the steps in C and D. The performance of CCS was tested on synthetic and real gene expression datasets. We also showed that there is an equivalence between CCS biclusters and condition-dependent co-expression network modules. The related Python codes (Python 2.7 or higher) are available in the "python_utility" directory. Synthetic and real gene expression datasets, and CCS biclustering results are available in the "Results" directory.
Installation and Execution
A. Compilation of C code
- Change your current directory to "src". $cd src
- Type "make" to create executable "ccsbc" in the home directory $make
- Back to parent directory cd ../
B. Execute C code
Type the following commands in Linux: ./ccs -t [correlation threshold] -i [input file] -o [output file] Parameters: -t [0-1.0]: Specify correlation threshold "theta" between 0 to 1. Recommended value is 0.8. -i [input_data_file] -o [output_data_file] Example: ./ccs -t 0.8 -i ./Results/Synthetic_data_results/Data/Data_Constant_100_1_bicluster.txt -o ./Results/Output.txt Additional parameters: -m [1 - number of gene/rows in the data matrix]: Set the number of base gene that are to be considered for forming biclusters. Default value is 1000 or maximum number of genes when that is less than 1000. Example: ./ccs -t 0.8 -i ./Results/Synthetic_data_results/Data/Data_Constant_100_1_bicluster.txt -o ./Results/Output.txt -m 90 -g [0.0 - 100.0]: Minimum gene set overlap required for merging the overlapped biclusters. Default value is 100.0 for 100% overlap. Example: ./ccs -t 0.8 -i ./Results/Synthetic_data_results/Data/Data_Constant_100_1_bicluster.txt -o ./Results/Output.txt -g 50.0 -p [0/1]: Set the output format. Default is 0. 0 - Print output in 3 rows.
Row 1: Number_of_rows[\t]Number_of_Columns[\t]Score
Row 2: Gene_name_1[b]Gene_name_2[b] ...
Row 3: Sample_name_1[b]Sample_name_2[b] ...
1 - Print output in 2 rows (Bibench supported format).
Row 1: Row_index_1[b]Row_index_2[b] ...
Row 2: Column_index_1[b]Column_index_2[b] ...
Example: ./ccs -t 0.9 -i ./Results/Synthetic_data_results/Data/Data_Constant_100_1_bicluster.txt -o ./Results/Output_standard.txt -m 50 -p 1 -g 100.0
C. Compilation of CUDA C code
*Note that a CUDA supported GPU card and CUDA C compiler is required.
- Change your current directory to "CUDA_C". $cd CUDA_C
- Type following in the linux command line $nvcc ./src/ccs.cu -lm -o ccs_cuda
D. Execute CUDA C code
Type following commands in the Linux: ./ccs_cuda -t [correlation threshold] -i [input file] -o [output file] Example: ./ccs_cuda -t 0.9 -i ../Synthetic_data_results/Data/Data_Constant_100_1_bicluster.txt -o ./Output.txt -m 50 -p 1 -g 100.0
Authors
• Anindya Bhattacharya, anindyamail123@gmail.com • Yan Cui, ycui2@uthsc.edu
Contact
If you have comments or questions, or if you would like to contribute to the further development of CCS, please send us an email at anindyamail123@gmail.com and ycui2@uthsc.edu
License
This projected is licensed under the terms of the GNU General Public License v3.0.