Home   >   CSC-OpenAccess Library   >    Manuscript Information
Full Text Available

This is an Open Access publication published under CSC-OpenAccess Policy.
Some Imputation Methods to Treat Missing Values in Knowledge Discovery in Data warehouse
Dr. Diwakar Shukla, Rahul Singhai
Pages - 1 - 13     |    Revised - 30-06-2010     |    Published - 10-08-2010
Volume - 1   Issue - 2    |    Publication Date - July 2010  Table of Contents
Data Preprocessing, Data Mining, Missing Values, Imputation, data cleaning, data reduction
One major problem in the data cleaning & data reduction step of KDD process is the presence of missing values in attributes. Many of analysis task have to deal with missing values and have developed several treatments to guess them. One of the most common method to replace the missing values is the mean method of imputation. In this paper we suggested a new imputation method by combining factor type and compromised imputation method, using two-phase sampling scheme and by using this method we impute the missing values of a target attribute in a data warehouse. Our simulation study shows that the estimator of mean from this method is found more efficient than compare to other.
CITED BY (5)  
1 Suguna, N. (2014). Certain investigations on Classification of medical datasets Using soft computing techniques.
2 Diwakar, S., Singhai, R., & Thakur, N. S. power constant based methods for dealing with missing values in knowledge discovery.
3 Shukla, D., Singhai, R., & Thakur, N. S. (2011). A New Imputation Method for Missing Attribute Values in Data Mining. Journal of Applied Computer Science & Mathematics, (10).
4 Gangele, S., Shukla, D., Verma, K., & Singh, P. (2011). Elasticities and Index Analysis of Usual Internet Traffic Share Problem. International Journal of Advanced Research in Computer Science, 2(4).
5 Shukla, D., Verma, K., & Gangele, S. re-attempt connectivety to internet analysis of user by markov chain model. chief patron chief patron.
1 Google Scholar
2 CiteSeerX
3 refSeek
4 Scribd
5 SlideShare
7 PdfSR
1 W. G. Cochran. “Sampling Techniques”, John Wiley and Sons, New York, (2005).
2 U Fayyad, Piatetsky-Shapiro, P.Smyth. ”Knowledge discovery and data mining: Towards a unifying framework”,In Proceedings of the 2nd ACM international conference on knowledge discovery and data mining (KDD), Portland, OR, pp 82–88.1996.
3 Piatetsky, Shapiro and J.William, Frawley. “Knowledge discovery in databases”,AAAI Press/MIT Press,1991.
4 R.Krishnamurthy, and T.Imielinski. “Research directions in Knowledge Discovery”, SIGMOD Record,20(3):76-78,1991.
5 D.Pyle. “Data preparation for data mining”, Morgan Kaufmann Publishers Inc, (1999).
6 J. Han, M. Kamber. “Data mining: concepts and techniques”, Academic Press, San Diego, (2001).
7 H. P. Kriegel, Karsten, M. Borgwardt, P. Kröge, A. Pryakhin, M. Schubert, A. Zimek, “Future trends in data mining”, Data Min Knowl Disc 15:87–97 DOI 10.1007/s10618-007- 0067-9,2007.
8 J. Kivinen and H.Mannila. “The power of sampling in knowledge discovery”, In Proc. Thirteenth ACM SIGACT-SIGMOD-SIGART Symp. Principles of Database Sys., pages 77– 85. ACM Press,1994.
9 M. J. Zaki, S. Parthasarathy, W. Lin, and M. Ogihara. “Evaluation of sampling for data mining of association rules”, Technical Report 617, University of Rochester, Rochester, NY,1996.
10 H. Toivonen. “Sampling large databases for association rules”, In Proc. 22nd VLDB 1996.
11 G. H. John and P. Langley. “Static versus dynamic sampling for data mining”, In Proc. Second Intl. Conf. Knowledge Discovery and Data Mining, pages 367–370. AAAI Press,1996.
12 C. Domingo, R. Gavalda and Q. Watanabe. “Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms”, Data Mining and Knowledge Discovery,2002.
13 M. Zaki, S. Parthasarathy, W. Li and M. Ogihara. “Evaluation of Sampling for Data Mining of Association Rules”, Proc. Int’l Workshop Research Issues in Data Eng,1997.
14 K.T. Chuang, K. P. Lin, and M. S. Chen. “Quality-Aware Sampling and Its Applications in Incremental Data Mining”, IEEE Transactions on knowledge and data engineering,vol.19, no. 4,2007.
15 K.Lakshminarayan, S. A. Harp and Samad. “Imputation of missing data in industrial databases, Appl. Intell., vol. 11, no. 3, pp. 259–275, Nov./Dec1999.
16 R. J. Little and D. B. Rubin. “Statistical Analysis With Missing Data”, Hoboken, NJ: Wiley, (1987).
17 H. L. Oh, and F. L. Scheuren. “Weighting adjustments for unit nonresponse, incomplete data in sample survey”, in Theory and Bibliographies, vol. 2, W. G. Madow, I. Olkin, and D. B. Rubin, Eds. New York: Academic, pp. 143–183,1983.
18 W. S. Sarle. “Prediction with missing inputs”, in Proc. 4th JCIS, vol. 2, pp. 399–402,1998.
19 K. J. Cios, W. Pedrycz, ,and R. Swiniarski. “Data Mining Methods for Knowledge Discovery”,Norwell, MA: Kluwer,(1998).
20 K. Chan, T. W. Lee, and T. J. Sejnowski. “Variational Bayesian learning of ICA with missing data, Neural Comput”, vol. 15, no. 8, pp. 1991–2011,2003.
21 Y. Freund and R. E. Schapire. “Experiments with a new boosting algorithm”, in Proc. 13th Int. Conf. Mach. Learn., pp. 146–148,1996.
22 V. Tresp, R. Neuneier, and S. Ahmad. “Efficient methods for dealing with missing data in supervised learning”, in Advances in Neural Information Processing Systems 7, G. Cambridge, MA: MIT Press, pp. 689–696,1995.
23 W. Zhang. “Association based multiple imputation in multivariate datasets”, A summary, in Proc. 16th ICDE, pp. 310–311,2000.
24 J. R. Quinlan. ”C4.5: Programs for Machine Learning”, San Mateo, CA: Morgan Kaufmann,1992.
25 J. R. Quinlan. “Induction of decision trees, Mach. Learn”, vol. 1, no. 1, pp. 81–106, 1986.
26 A. Farhangfar, L. A. Kurgan, and W. Pedrycz. “Novel framework for imputation of missing values in databases”, Comput.: Theory and Appl. II Conf., Conjunction with SPIE Defense and Security Symp. (formerly AeroSense), Orlando, FL, pp. 172–182,2004.
27 G. Batista and M. Monard. “An analysis of four missing data treatment methods for supervised learning”, Appl. Artif. Intell., vol. 17, no. 5/6, pp. 519–533,2003
28 D. F. Heitjan and S. Basu. “Distinguishing ‘Missing at random’ and ‘missing completely at random”, The American Statistician, 50, 207-213,1996.
29 V. N. Reddy. “A study on the use of prior knowledge on certain population parameters in estimation”, Sankhya, C, 40, 29-37,1978.
30 D. Shukla. “F-T estimator under two-phase sampling”, Metron, 59, 1-2, 253-263,2002.
31 S. Singh, and S. Horn. “Compromised imputation in survey sampling”, Metrika, 51, 266- 276,2000.
32 Li.Liu, Y. Tu, Y. Li, and G. Zou. “Imputation for missing data and variance estimation when auxiliary information is incomplete”, Model Assisted Statistics and Applications, 83-94,2005.
33 S.Singh. “A new method of imputation in survey sampling”, Statistics, Vol. 43, 5 , 499 – 511,2009.
Associate Professor Dr. Diwakar Shukla
Dr. H.S.G. Central University, Sagar (M.P.), India. - India
Mr. Rahul Singhai
Devi Ahilya University - India