Home   >   CSC-OpenAccess Library   >    Manuscript Information
Reconstruction of a Complete Dataset from an Incomplete Dataset by ARA (Attribute Relation Analysis): Some Results
Sameer S. Prabhune, S. R. Sathe
Pages - 35 - 42     |    Revised - 31-01-2011     |    Published - 08-02-2011
Volume - 1   Issue - 5    |    Publication Date - January / February  Table of Contents
Data mining., Data preprocessing, Missing data
Preprocessing is crucial steps used for variety of data warehousing and mining Real world data is noisy and can often suffer from corruptions or incomplete values that may impact the models created from the data. Accuracy of any mining algorithm greatly depends on the input data sets. Incomplete data sets have become almost ubiquitous in a wide variety of application domains. Common examples can be found in climate and image data sets, sensor data sets and medical data sets. The incompleteness in these data sets may arise from a number of factors: in some cases it may simply be a reflection of certain measurements not being available at the time; in others the information may be lost due to partial system failure; or it may simply be a result of users being unwilling to specify attributes due to privacy concerns. When a significant fraction of the entries are missing in all of the attributes, it becomes very difficult to perform any kind of reasonable extrapolation on the original data. For such cases, we introduce the novel idea of attribute weightage, in which we give weight to every attribute for prediction of the complete data set from incomplete data sets, on which the data mining algorithms can be directly applied. The attraction behind the idea of weights on attribute and finally averaging it. We demonstrate the effectiveness of the approach on a variety of real data sets. This paper describes a theory and implementation of a new filter ARA (Attribute Relation Analysis) to the WEKA workbench, for finding the complete dataset from an incomplete dataset.
1 Google Scholar 
2 CiteSeerX 
3 Scribd 
4 SlideShare 
5 PdfSR 
C. J. Date and H. Darwen, “The Default Values approach to Missing Information,” Relational Database Writings 1989-1991, pp.343-354, 1989.
Ian H. Witten and Eibe Frank , “Data Mining: Practical Machine Learning Tools and Techniques”Second Edition, Morgan Kaufmann Publishers. ISBN: 81-312-0050-7.
J. L. Schafer, “Analysis of Incomplete Multivariate Data”, Monographs on Stat and Applied Prob. 72,Chapman and Hall/CRC.
J. Quinlan, “C4.5: Programs for Machine Learning”, San Mateo, Calif.: Morgan Kaufmann, 1993.
J. W. Grzymala-Busse and M.Hu. “A comparison of Several Approaches to Missing Attribute Values in Data Mining, Rough Sets and Current Trends in Computing”, 378-385, 2000.
R.J.A. Little and D. Rubin. “Statastical Analysis with Missing Data”. Ch. 3,pp-42-53,Wiley Series in Prob. and Stat., 2002.
S. Mehta,, S. Parthsarthy and H. Yang “Toward Unsupervised correlation preserving discretization”,IEEE Trans. Knowledge and Data Eng.pp.1174-1185 ,2005.
S.Parthsarthy and C.C. Aggarwal, “On the Use of Conceptual Reconstruction for Mining Massively Incomplete Data Sets”,IEEE Trans. Knowledge and Data Eng., pp. 1512-1521,2003.
UCI Machine Learning Repository, http://www.ics.uci.edu/umlearn/MLsummary.html
wekaWiki link : http://weka.sourceforge.net/wiki/index.php/Main_Page
X. Zhu and X. Wu, “ Cost Constrained Data Acquisition for Intelligent Data Preparation”, IEEE Transactions on Knowledge and Data Engineering, Vol.17, Number 11, pp.1542-1556.
Mr. Sameer S. Prabhune
Shri Sant gajanan Maharaj college Of Engineering, Shegaon - India
Dr. S. R. Sathe
Visveswarayya National Institute Of Technology, Nagpur - India