Home   >   CSC-OpenAccess Library   >    Manuscript Information
Full Text Available

(162.49KB)
This is an Open Access publication published under CSC-OpenAccess Policy.
Publications from CSC-OpenAccess Library are being accessed from over 74 countries worldwide.
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Expression Data
Dwitiya Tyagi-Tiwari, Sujoy Das, Manoj Jha, Namita Srivastava
Pages - 253 - 265     |    Revised - 31-08-2015     |    Published - 30-09-2015
Volume - 9   Issue - 5    |    Publication Date - September / October 2015  Table of Contents
MORE INFORMATION
KEYWORDS
Biclustering Analysis, Gene Expression, Parallel Computing Toolbox, Fuzzy, MATLABMPI.
ABSTRACT
Biclusters are required to analyzing gene expression patterns of genes comparing rows in expression profiles and analyzing expression profiles of samples by comparing columns in gene expression matrix. In the process of biclustering we need to cluster genes and samples. The algorithm presented in this paper is based upon the two-way clustering approach in which the genes and samples are clustered using parallel fuzzy C-means clustering using message passing interface, we call it MFCM. MFCM applied for clustering on genes and samples which maximize membership function values of the data set. It is a parallelized rework of a parallel fuzzy two-way clustering algorithm for microarray gene expression data [9], to study the efficiency and parallelization improvement of the algorithm. The algorithm uses gene entropy measure to filter the clustered data to find biclusters. The method is able to get highly correlated biclusters of the gene expression dataset.

We have implemented the algorithm of fuzzy c-means in MATLAB parallel computing platform using MATLABMPI (Message Passing Version of MATLAB). This approach is used to find biclusters of gene expression matrices. The biclustering method is also parallelized to reduce the gene centers with lower entropy filter function. By this function we choose the gene cluster centers with minimum entropy. The algorithm is tested on well-known cell cycle of the budding yeast S. cerevisiae by Cho et al. and Tavazoi et.al data sets, breast cancer subtypes Basal A, Basal B and Leukemia from Golub et al.
1 Google Scholar 
2 CiteSeerX 
3 refSeek 
4 Scribd 
5 SlideShare 
6 PdfSR 
1 Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar, “Introduction to parallel computing”, Addison-Wesley, 2003.
2 A.H. Tewfik, A.B. Techagang and I. Vertatsehitsch, “Parallel Identification of Gene Biclusters with Coherent Evolutions”, IEEE Transaction on Signal Processing, Vol. 54, No. 6, June2006.
3 Bezdek,J.C., “Pattern Recognition With Fuzzy Objective Function Algorithms”, Plenum Press, New York, 1981.
4 B. Chandra, S. Shankera, Saroj Mishra, "A new approach: Interrelated two-way clustering of gene expression data", Statistical Methodology 3, 2006, pp. 93–102.
5 Chun Tang and Aidong Zhang, “Interrelated Two-Way Clustering and Its Application on Gene Expression Data ", International Journal on Artificial Intelligence Tools, 2005; Vol. 14, No. 4; pp. 577-598.
6 Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrielian AE, Landsman D, Lockhart DJ, Davis RW, ‘A genome-wide transcriptional analysis of the mitotic cell cycle’, Molecular Cell, Vol. 2, 65–73, July, 1998.
7 Dembele D, Kastner P. Fuzzy C-means method for clustering microarray data. Bioinformatics 2003; 19(8):973–80.
8 Dwitiya Tyagi, Sujoy Das, and Namita Srivastava, Parallel Two-way Clustering for Microarray Gene expression data’, International Journal of Computer Science Trends and Technology, Vol. 3 Issue 3, May-June 2015.
9 Dwitiya Tyagi, Sujoy Das, and Namita Srivastava, ‘Two-way Clustering Analysis using Parallel fuzzy approach for Microarray Gene expression data’, 2015.
10 G. Kerr, H.J. Ruskin, M. Crane and P. Doolan, “Techniques for clustering gene expression data”, Computers in Biology and Medicine 38, pp. 283-293, 2008.
11 Hartigan J.: “Direct Clustering of a Data Matrix”, J Am Stat Assoc 1972, 67(337), pp. 123-129.
12 Huimin Geng, Dhundy Bastola, and Hesham Ali, “A New Approach to Clustering Biological Data Using Message Passing”, Proceedings of the IEEE Computational Systems Bioinformatics Conference, 2004.
13 Jeremy Kepner, “Parallel programming with MATLABMPI”, 2002, High Performance Embedded Computing (HPEC) workshop, MIT Lincoln Laboratory, Lexington, MA, http://arXiv.org/abs/astro-ph/0107406
14 Liu Weihj And Chen Ling, “A Parallel Algorithm for Gene Expressing Data Biclustering”, Journal Of Computers, Vol. 3, No. 10, October 2008.
15 Li Li, Yang Guo, Wenwu Wu, Youyi Shi, Jian Cheng and Shiheng Tao, "A comparison and evaluation of five biclustering algorithms by quantifying goodness of biclusters for gene expression data", BioData Mining 2012, Vol-8 pp. 1756-0381.
16 Matthias E. Futschik and Nikola K. Kasabov, “Fuzzy Clustering of Gene Expression Data”, 2002 IEEE International Conference on Fuzzy Systems, 2002, Vol 1, pp. 414-419.
17 MATLAB the MathWorksTM Accelerating the pace of engineering and science Parallel Computing Toolbox™ 4 User’s Guide, 2009.
18 Sara C. Madeira and Arlindo L. Oliveira, "Biclustering Algorithms for Biological Data Analysis: A Survey", IEEE/Acm Transactions on Computational Biology and Bioinformatics Vol 1, No. 1, January-March 2004, pp. 24-45.
19 T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gassenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing,M.A. Caligiuri, D.D. Bloomfield, E.S. Lander, Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science 286 (15) (1999) pp 531–537.
20 Terence Kwok, Kate Smith, Sebastian Lozano and David Taniar, “Parallel Fuzzy c-Means Clustering for Large Data Sets”, Springer-Verlag Berlin Heidelberg 2002, LNCS 2400, pp. 365-374.
21 Yizong Cheng and George M. Church, “Biclustering of Expression Data”, Proc. ISMB’00, pp. 93-103, 2000.
22 Tavazoie,S., Hughes,J.D., Campbell,M.J., Cho,R.J. and Church,G.M. (1999) Systematic determination of genetic network architecture. Nat. Genet., 22, 281–285.
23 Hoshida Y, Brunet J-P, Tamayo P, Golub TR, Mesirov JP Subclass Mapping: Identifying Common Subtypes in Independent Disease Data Sets. PLoS ONE 2(11), 2007.
24 https://www.ll.mit.edu/mission/cybersec/softwaretools/MATLABmpi/MATLABmpi.html.
25 https://www.mpich.org/documentation/guides/.
Mr. Dwitiya Tyagi-Tiwari
Department of Mathematics & Computer Applications Maulana Azad National Institute of Technology Bhopal-462051, India - India
dwitiya.sr@gmail.com
Dr. Sujoy Das
Department of Mathematics & Computer Applications Maulana Azad National Institute of Technology Bhopal-462051, India - India
Dr. Manoj Jha
Department of Mathematics & Computer Applications Maulana Azad National Institute of Technology Bhopal-462051, India - India
Dr. Namita Srivastava
Department of Mathematics & Computer Applications Maulana Azad National Institute of Technology Bhopal-462051, India - India