Home   >   CSC-OpenAccess Library   >    Manuscript Information
Full Text Available

(343.82KB)
This is an Open Access publication published under CSC-OpenAccess Policy.
Outlier Modification and Gene Selection for Binary Cancer Classification using Gaussian Linear Bayes Classifier
Md. Hadiul Kabir, Md. Nurul Haque Mollah
Pages - 13 - 24     |    Revised - 31-08-2015     |    Published - 30-09-2015
Volume - 9   Issue - 2    |    Publication Date - September 2015  Table of Contents
MORE INFORMATION
KEYWORDS
Gene Expression, Outlier Modification, Top DE Genes Selection, Binary Classification, Gaussian Bayes Classifier, Misclassification Error Rate (MER).
ABSTRACT
Gaussian linear Bayes classifier is one of the most popular approaches for classification. However, it is not so popular for cancer classification using gene expression data due to the inverse problem of its covariance matrix in presence of large number of gene variables with small number of cancer patients/samples in the training dataset. To overcome these problems, we propose few top differentially expressed (DE) genes from both upregulated and downregulated groups for binary cancer classification using the Gaussian linear Bayes classifier. Usually top DE genes are selected by ranking the p-values of t-test procedure. However, both t-test statistic and Gaussian linear Bayes classifier are sensitive to outliers. Therefore, we also propose outlier modification for gene expression dataset before applying to the proposed methods, since gene expression datasets are often contaminated by outliers due to several steps involves in the data generating process from hybridization to image analysis. The performance of the proposed method is investigated using both simulated and real gene expression datasets. It is observed that the proposed method improves the performance with outlier modifications for binary cancer classification.
CITED BY (0)  
1 Google Scholar
2 CiteSeerX
3 refSeek
4 Scribd
5 SlideShare
6 PdfSR
1 A. Sharma, and K.K. Paliwal. “Cancer classification by gradient LDA technique using microarray gene expression data.” Data Knowl. Eng., vol. 66, pp. 338-347, 2008.
2 S. Dudoit, J f Fridlyand, T. P Speed. “Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data.” Journal of the American Statistical Association, vol. 97, No. 457, pp. 77-87, Mar. 2002.
3 T.R. Golub, D.K. Slonim, P. Tamayo, M. Gaasenbeek C. Huard, J.P. Mesirov, H. Coller, M. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander. “Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring.” Science, pages 531–537, Oct 1999.
4 V. Van’t, , L.J. Dai, H. Van de, M.J. Vijver and Y.D. He et al. “Gene expression profiling predicts clinical outcome of breast cancer.” Lett. Nature. Nature, vol. 415, pp. 530-536, 2002.
5 A. Berns. “Cancer: Gene expression in diagnosis.” Nature, pages 491–492, Feb 2000.
6 A. Azuaje. "Interpretation of genome expression patterns: computational challenges and pportu-nities.” IEEE Engineering in Medicine and Biology, 2000.
7 G.J. Gordon, R.V. Jensen, L.L. Hsiao, S.R. Gullans and J.E. Blumenstock et al. “Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma.” Cancer Res., vol. 62, pp. 4963-4967, 2002.
8 L. Ziaei, A. R. Mehri, M. Salehi. " Application of Artificial Neural Networks in Cancer Classification and Diagnosis Prediction of a Subtype of Lymphoma Based on Gene Expression Profile." Journal of Research in Medical Sciences, vol. 11, No. 1, Jan. & Feb. 2006.
9 S. Lakhani and A. Ashworth. “Microarray and histopathological analysis of tumours: the future the past?” Nature Reviews Cancer, pages 151–157, Nov 2001.
10 D. Nguyen and D. Rocke. “Classification of Acute Leukemia based on DNA Microarray Gene Expressions using Partial Least Squares.” Kluwer Academic, 2002.
11 I. Guyon, J. Weston, S. Barnhill, M. D., and V. Vapnik. “Gene selection for cancer classification using support vector machines.” Machine Learning, 2000.
12 A.C. Tan and D. Gilbert. ”Ensemble machine learning on gene expression data for cancer classification.” Applied Bioinform., vol. 2, pp. S75-83, 2003.
13 G. Cong, K.L. Tan, A.K.H. Furey, T.S., N. Cristianini, N. Duffy, D.W. Bednarski and M. Schummer et al. “Support vector machine classification and validation of cancer tissue samples using microarray expression data.” Bioinformatics, vol. 16, pp. 906-914, 2005.
14 Y. Wang, I.V. Tetko, M.A. Hall, E. Frank and A. Facius et al. “Gene selection from microarray data for cancer classification - a machine learning approach.” Comput. Biol. Chem., vol. 29, pp. 37-46, 2005.
15 A. Statnikov, L. Wang and C. F. Aliferis. “A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification.” Journal BMC bioinformatics, 2008.
16 A. Ben-Dor, L. Bruhn, N. Friedman, I. Nachman, M. Schummer, and Z. Yakhini. “Tissue classication with gene expression profiles.” In Proc. of the Fourth Annual Int. Conf. on Computational Molecular Biology, 2000.
17 D. Slonim, P. Tamayo, J. Mesirov, T. Golub, and E. Lander. “Class prediction and discovery using gene expression data.” In Proc. 4th Int. Conf. on Computational Molecular Biology(RECOMB), pages 263–272, 2000.
18 Liang-Tsung Huang. “An integrated method for cancer classification and rule extraction from microarray data.” Journal of Biomedical Science, 2009.
19 Kun-Huang Chen, Kung-Jeng Wang, Min-Lung Tsai, Kung-Min Wang, Angelia Melani Adrian, Wei-Chung Cheng, Tzu-Sen Yang, Nai-Chia Teng, Kuo-Pin Tan and Ku-Shang Chang. “Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithms.” BMC Bioinformatics, 15:49, 2014.
20 Desheng Huang, Yu Quan, Miao He and Baosen Zhou. “Comparison of linear discriminant analysis methods for the classification of cancer based on gene expression data.” Journal of Experimental & Clinical Cancer Research, 28:149, 2009.
21 Ubharup Guha, Yuan Ji and Veerabhadran Baladandayuthapani. “Bayesian Disease Classification Using Copy Number Data.” Cancer Informatics, vol. 13 (S2), pp. 83–91, 2014.
22 Sharma, A., C.H. Koh, S. Imoto and S. Miyano. “Strategy of finding optimal number of features on gene expression data.” Elect. Lett., vol. 47, pp. 480-482, 2011a.
23 H.A.L. Thi, V.V. Nguyen and S. Ouchani. “Gene selection for cancer classification using DCA.” Adv. Data Min. Appli., vol. 5139, pp. 62-72, 2008.
24 H. Rattikorn, K. Phongphun, “Tumor classification ranking from microarray data.” BMC genomics journal, vol. 9, pp. s21, September 2008.
25 T.W. Anderson. An Introduction to Multivariate Statistical Analysis, Wiley Interscience, 2003.
26 Liu,H., et al. (2002) A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Inform., 13, 51–60.
27 Wu,B., et al. (2003) Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data. Bioinformatics, 19, 1636–1643.
28 Jafari,P. and Azuaje,F. (2006) An assessment of recently published gene expression data analyses: reporting experimental design and statistical factors. BMC Medical Informatics and Decision Making, Vol. 6.
29 F.R. Hampel, E.M. Ronchetti, P.J. Rousseeuw and W.A. Stahel. “Robust Statistics: The Approach Based on Influence Functions.” John Wiley and Sons: New York, 1986.
30 P. J. Huber. Robust Statistics. John Wiley and Sons: New York, 2004.
31 P. J. Rousseeuw and A. M. Leroy. Robust Regression and Outlier Detection. Wiley: New York, 1987.
32 A. Bharathi, A. M. Natarajan. “Cancer Classification of Bioinformatics data using ANOVA” International Journal of Computer Theory and Engineering, Vol. 2, No. 3, June, 2010.
33 Sandrine Dudoit, Jane Fridlyand, and Terence P. Speed “Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data” Journal of the American Statistical Association, Vol. 97, No. 457, Applications and Case Studies, March 2002.
Mr. Md. Hadiul Kabir
Department of Statistics, University of Rajshahi, Bangladesh - Bangladesh
Professor Md. Nurul Haque Mollah
University of Rajshahi, Bangladesh - Bangladesh
mollah.stat.bio@ru.ac.bd