Data Quality Mining using Genetic Algorithm
International Journal of Computer Science and Security (IJCSS)
Volume:  3    Issue:  2
Pages:  62-153
Publication Date:   April 2009
ISSN (Online): 1985-1553
105 - 112
Sufal Das - India
Banani Saha - India
CSC Journals, Kuala Lumpur, Malaysia
KEYWORDS:   Data Quality, Genetic Algorithms, Association Rule Mining, Multi-objective Optimization 
Data quality mining (DQM) as a new and promising data mining approach from the academic and the business point of view. Data quality is important to organizations. People use information attributes as a tool for assessing data quality. The goal of DQM is to employ data mining methods in order to detect, quantify, explain and correct data quality deficiencies in very large databases. Data quality is crucial for many applications of knowledge discovery in databases (KDD). In this work, we have considered four data qualities like accuracy, comprehensibility, interestingness and completeness. We have tried to develop Multi-objective Genetic Algorithm (GA) based approach utilizing linkage between feature selection and association rule. The main motivation for using GA in the discovery of high-level prediction rules is that they perform a global search and cope better with attribute interaction that the greedy rule induction algorithms often used in data mining. 
1 Jochen Hipp,Ulrich G¨untzer and Udo Grimmer, “Data Quality Mining - Making a Virtue of Necessity”, In Proceedings of the 6th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD) 2001.
2 R. Agrawal, R. Srikant, “Fast algorithms for mining association rules”, in Proceeding of the 20th Int’l Conference on Very Large Databases, Chile, 1994.
3 Imielinski, T., R. Agrawal and A. Swami, “Mining association rules between sets of items in large databases”. Proc. ACM SIGMOD Conf. Management of Data, pp: 207–216.
4 K.M. Faraoun, A. Rabhi, “Data dimensionality reduction based on genetic selection of feature subsets”, EEDIS UDL University- SBA, (2002).
5 Cheng-Hong Yang, Chung-Jui Tu, Jun-Yang Chang Hsiou-Hsiang Liu Po-Chang Ko, “Dimensionality Reduction using GA-PSO”(2001).
6 P_adraig, “Dimension Reduction”, Cunningham University College Dublin Technical Report UCDCSI-2007-7 August 8th, 2007
7 Erick Cantu-Paz, “Feature Subset Selection, Class Separability, and Genetic Algorithms”, Center for Applied Scientic Computing Lawrence Livermore National Laboratory Livermore, CA, (1994).
8 M. Pei, E. D. Goodman, F. Punch, “Feature Extraction using genetic algorithm”, Case Center for Computer-Aided Engineering and Manufacturing W. Department of Computer Science,(2000).
9 Sufal Das, Bhabesh Nath, “Dimensionality Reduction using Association Rule Mining”, IEEE Region 10 Colloquium and Third International Conference on Industrial and Information Systems (ICIIS 2008) December 8-10, 2008, IIT Kharagpur, India
10 Hsu, W., B. Liu and S. Chen, “Ggeneral impressions to analyze discovered classificationrules”,. Proc. Of 3rd Intl. Conf. On Knowledge Discovery & Data Mining (KDD-97), pp: 31–36.AAAI Press.(1997)
11 Freitas, A.A., E. Noda and H.S. Lopes,. “Discovering interesting prediction rules with a genetic algorithm”’. Proc. Conf. Evolutionary Computation, (CEC-99), pp: 1322–1329.(1999)
12 Cristiano Pitangui, Gerson Zaverucha, “Genetic Based Machine Learning:Merging Pittsburgh and Michigan, an Implicit Feature Selection Mechanism and a New Crossover Operator”, Proceedings of the Sixth International Conference on Hybrid Intelligent Systems (HIS'06).(2006).
1 O. J. Oyelade and O. O. Oyejoke, “Knowledge Discovery from Students’ Result Repository: Association Rule Mining Approach”, International Journal of Computer Science and Security (IJCSS), 4(2), pp. 199 – 207, 2010.
2 E. Omara, T. E. Said and M. Mousa, “Employing Neural Networks for Assessment of Data Quality with Emphasis on Data Completeness”, International Journal on Artificial Intelligence and Machine Learning, 11(I), pp. 21--28, 2011.
3 M. Awad, “Optimization RBFNNs Parameters Using Genetic Algorithms: Applied on Function Approximation”, International Journal of Computer Science and Security (IJCSS), 4(3), pp. 295 – 307, 2010.
4 H. Miao, “A Multi-Operator Based Simulated Annealing Approach for Robot Navigation in Uncertain Environments”, International Journal of Computer Science and Security (IJCSS), 4(1), pp. 50 – 61, 2010.
5 R. M. Kumar and Dr. K. Iyakutti. “Application of Genetic algorithms for the prioritization of Association Rules”. IJCA Special Issue on Artificial Intelligence Techniques - Novel Approaches & Practical Applications (3), pp. 1–3, 2011.
6 E. Chandra and K. Nandhini, “Knowledge Mining from Student Data”, European Journal of Scientific Research 47(1), pp.156-163, 2010.
Sufal Das : Colleagues
Banani Saha : Colleagues  
