Home   >   CSC-OpenAccess Library   >    Manuscript Information
Unsupervised Feature Selection Based on the Distribution of Features Attributed to Imbalanced Data Sets
Mina Alibeigi, Sattar Hashemi, Ali Hamzeh
Pages - 14 - 22     |    Revised - 31-03-2011     |    Published - 04-04-2011
Volume - 2   Issue - 1    |    Publication Date - March / April 2011  Table of Contents
MORE INFORMATION
KEYWORDS
Feature, Feature Selection, Filter Approach, Imbalanced Data Set
ABSTRACT
Since dealing with high dimensional data is computationally complex and sometimes even intractable, recently several feature reductions methods have been developed to reduce the dimensionality of the data in order to simplify the calculation analysis in various applications such as text categorization, signal processing, image retrieval, gene expressions and etc. Among feature reduction techniques, feature selection is one the most popular methods due to the preservation of the original features. However, most of the current feature selection methods do not have a good performance when fed on imbalanced data sets which are pervasive in real world applications. In this paper, we propose a new unsupervised feature selection method attributed to imbalanced data sets, which will remove redundant features from the original feature space based on the distribution of features. To show the effectiveness of the proposed method, popular feature selection methods have been implemented and compared. Experimental results on the several imbalanced data sets, derived from UCI repository database, illustrate the effectiveness of our proposed methods in comparison with the other compared methods in terms of both accuracy and the number of selected features.
CITED BY (10)  
1 Pant, H., & Srivastava, R. MINDEX_IB: A Feature Selection method for Imbalanced Dataset. IONOSPHERE, 34(2), 126-225.
2 Pant, H., & Srivastava, R. a survey on feature selection methods for imbalanced datasets.
3 ORESKI, D., & KLICEK, B. A novel feature selection techniques based on contrast set mining.
4 Jiangsheng Yi, & Wanglian Xi. (2013). Unsupervised feature unbalanced data selection method. Small Computer Systems, 34 (1), 63-66.
5 Jiangsheng Yi, & Wanglian Xi. (2013). Unsupervised feature selection method for imbalanced data. Computer Systems, 34 (1), 63-67.
6 Reyes, J. A., Montes, A., González, J. G., & Pinto, D. E. (2013). Clasificación de roles semánticos usando características sintácticas, semánticas y contextuales. Computación y sistemas, 17(2), 263-272.
7 Jiang, S. Y., & Wang, L. X. (2013). Unsupervised Feature Selection Method for Imbalanced Data. Journal of Chinese Computer Systems, 34(1), 63-67.
8 Reyes, J. A., Montes, A., González, J. G., & Pinto, D. E. (2013). Classifying Case Relations using Syntactic, Semantic and Contextual Features. Computación y Sistemas, 17(2).
9 Asaduzzaman, M., Kabir, A. M. E., Uddin, N., Mollah, A. S., & Nurunnabi, M. A Feature Selection Approach Using Asymmetry.
10 Cuaya, G., Munoz-Meléndez, A., & Morales, E. F. (2011). A minority class feature selection method. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications (pp. 417-424). Springer Berlin Heidelberg.
1 Google Scholar 
2 CiteSeerX 
3 refSeek 
4 Scribd 
5 SlideShare 
6 PdfSR 
A.I. Schein and L.H. Ungar, “Active learning for logistic regression: an evaluation”, Machine Learning, vol. 68, pp. 235–265, 2007
E. Frank, M.A. Hall, G. Holmes, R. Kirkby and B. Pfahringer. “Weka - a machine learning workbench for data mining”, In The Data Mining and Knowledge Discovery Handbook, pp. 1305-1314, 2005
G. Forman. "An extensive empirical study of feature selection metrics for text classification", Journal of Machine Learning Research, vol. 3, pp. 1289-1305, 2003
G.H. John and P. Langley. “Estimating Continuous Distributions in Bayesian Classifiers”. In: Eleventh Conference on Uncertainty in Artificial Intelligence, San Mateo, pp. 338-345, 1995
G.T. Toussaint and T.R. Vilmansen. “Comments on Feature Selection with a Linear Dependence Measure”, IEEE Trans. Computers, 408, 1972
H. Liu and R. Setiono. “A probabilistic approach to feature selection - A filter solution”. In: 13th International Conference on Machine Learning, pp. 319-327, 1996
H. Liu, J. Sun, L. Liu and H. Zhang, “Feature selection with dynamic mutual information”, Pattern Recognition, vol. 42, pp. 1330 – 1339, 2009
I.K. Fodor. “A survey of dimension reduction techniques”, Technical Report UCRL- ID- 148494, Lawrence Livermore National Laboratory, US Department of Energy, 2002
J. Dy and C. Btodley. “Feature Subset Selection and Order Identification for Unsupervised Learning”, Proc. 17th Int’l. Conf. Machine Learning, 2000
K. Fukunaga. “Introduction to Statistical Pattern Recognition”, Academic Press, 2nd Ed. 1990
M. Dash and H. Liu. “Unsupervised Feature Selection”, Proc. Pacific Asia conf. Knowledge Discovery and Data Mining, pp. 110-121, 2000
M. Dash and H. Liu. “Unsupervised Feature Selection”, Proc. Pacific Asia Conf. Knowledge Discovery and Data Mining, pp. 110-121, 2000
M. Lindenbaum, S. Markovitch and D. Rusakov. “Selective sampling for nearest neighbor classifiers”, Machine learning, vol. 54, pp. 125–152, 2004
M. Wasikowski and X. Chen. “Combating the small sample class imbalance problem using feature selection”, IEEE Transactions on knowledge and data engineering, 2009
M.A. Hall. “Correlation-based feature selection for discrete and numeric class machine learning”, Department of Computer Science, University of Waikato, Hamilton, New Zealand, 2000
M.A. Hall. “Correlation-based feature subset selection for machine learning”, Ph.D. Dissertation, Department of Computer Science, University of Waikato, Hamilton, New Zealand, 1999
M.P. Narendra and K. Fukunaga. “A branch and bound algorithm for feature subset selection”, IEEE Trans. Comput. Vol. 26, pp. 917–922, 1997
N. Pradhananga. “Effective Linear-Time Feature Selection”, Department of Computer Science, University of Waikato, Hamilton, New Zealand, 2007
P. Pudil, J. Novovicova and J. Kittler. “Floating Search Methods in Feature Selection”, Pattern Recognition Letters, vol. 15, pp. 1119-1125, 1994
P.A. Devijver and J. Kittler. “Pattern Recognition: A Statistical Approach”, Englewood Cliffs: Prentice Hall, 1982
R. Bellman. “Adaptive Control Processes: A Guided Tour”, Princeton University Press, Princeton, 1961
R.O. Duda, P.E. Hart and D.G. Stork. “Pattern Classification”, Second Edition, Wiley, 1997
S.Basu, C.A. Micchelli and P. Olsen. “Maximum Entropy and Maximum Likelihood Criteria for Feature Selection from Multi-variate Data”, Proc. IEEE Int’l. Symp. Circuits and Systems, pp. 267-270, 2000
S.K .Das. “Feature Selection with a Linear Dependence Measure”, IEEE Trans. Computers, pp. 1106-1109, 1971
S.K .Pal, R.K. De and J. Basak. “Unsupervised Feature Evaluation: A Neuro-Fuzzy Approach”, IEEE Trans. Neural Network, vol. 11, pp. 366-376, 2000
U. Fayyad, G. Piatetsky-Shapiro and P. Smyth. “From data mining to knowledge discovery in databases”, AI Magazine, vol. 17, pp. 37–54, 1996
Miss Mina Alibeigi
University - Iran
minaalibeigi@gmail.com
Dr. Sattar Hashemi
Shiraz University - Iran
Dr. Ali Hamzeh
Shiraz University - Iran