Home   >   CSC-OpenAccess Library   >    Manuscript Information
Some Imputation Methods to Treat Missing Values in Knowledge Discovery in Data warehouse
Dr. Diwakar Shukla, Rahul Singhai
Pages - 1 - 13     |    Revised - 30-06-2010     |    Published - 10-08-2010
Volume - 1   Issue - 2    |    Publication Date - July 2010  Table of Contents
Data Preprocessing, Data Mining, Missing Values, Imputation, data cleaning, data reduction
One major problem in the data cleaning & data reduction step of KDD process is the presence of missing values in attributes. Many of analysis task have to deal with missing values and have developed several treatments to guess them. One of the most common method to replace the missing values is the mean method of imputation. In this paper we suggested a new imputation method by combining factor type and compromised imputation method, using two-phase sampling scheme and by using this method we impute the missing values of a target attribute in a data warehouse. Our simulation study shows that the estimator of mean from this method is found more efficient than compare to other.
CITED BY (5)  
1 Suguna, N. (2014). Certain investigations on Classification of medical datasets Using soft computing techniques.
2 Diwakar, S., Singhai, R., & Thakur, N. S. power constant based methods for dealing with missing values in knowledge discovery.
3 Shukla, D., Singhai, R., & Thakur, N. S. (2011). A New Imputation Method for Missing Attribute Values in Data Mining. Journal of Applied Computer Science & Mathematics, (10).
4 Gangele, S., Shukla, D., Verma, K., & Singh, P. (2011). Elasticities and Index Analysis of Usual Internet Traffic Share Problem. International Journal of Advanced Research in Computer Science, 2(4).
5 Shukla, D., Verma, K., & Gangele, S. re-attempt connectivety to internet analysis of user by markov chain model. chief patron chief patron.
1 Google Scholar 
2 CiteSeerX 
3 refSeek 
4 Scribd 
5 SlideShare 
7 PdfSR 
A. Farhangfar, L. A. Kurgan, and W. Pedrycz. “Novel framework for imputation of missing values in databases”, Comput.: Theory and Appl. II Conf., Conjunction with SPIE Defense and Security Symp. (formerly AeroSense), Orlando, FL, pp. 172–182,2004.
C. Domingo, R. Gavalda and Q. Watanabe. “Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms”, Data Mining and Knowledge Discovery,2002.
D. F. Heitjan and S. Basu. “Distinguishing ‘Missing at random’ and ‘missing completely at random”, The American Statistician, 50, 207-213,1996.
D. Shukla. “F-T estimator under two-phase sampling”, Metron, 59, 1-2, 253-263,2002.
D.Pyle. “Data preparation for data mining”, Morgan Kaufmann Publishers Inc, (1999).
G. Batista and M. Monard. “An analysis of four missing data treatment methods for supervised learning”, Appl. Artif. Intell., vol. 17, no. 5/6, pp. 519–533,2003
G. H. John and P. Langley. “Static versus dynamic sampling for data mining”, In Proc. Second Intl. Conf. Knowledge Discovery and Data Mining, pages 367–370. AAAI Press,1996.
H. L. Oh, and F. L. Scheuren. “Weighting adjustments for unit nonresponse, incomplete data in sample survey”, in Theory and Bibliographies, vol. 2, W. G. Madow, I. Olkin, and D. B. Rubin, Eds. New York: Academic, pp. 143–183,1983.
H. P. Kriegel, Karsten, M. Borgwardt, P. Kröge, A. Pryakhin, M. Schubert, A. Zimek, “Future trends in data mining”, Data Min Knowl Disc 15:87–97 DOI 10.1007/s10618-007- 0067-9,2007.
H. Toivonen. “Sampling large databases for association rules”, In Proc. 22nd VLDB 1996.
J. Han, M. Kamber. “Data mining: concepts and techniques”, Academic Press, San Diego, (2001).
J. Kivinen and H.Mannila. “The power of sampling in knowledge discovery”, In Proc. Thirteenth ACM SIGACT-SIGMOD-SIGART Symp. Principles of Database Sys., pages 77– 85. ACM Press,1994.
J. R. Quinlan. “Induction of decision trees, Mach. Learn”, vol. 1, no. 1, pp. 81–106, 1986.
J. R. Quinlan. ”C4.5: Programs for Machine Learning”, San Mateo, CA: Morgan Kaufmann,1992.
K. Chan, T. W. Lee, and T. J. Sejnowski. “Variational Bayesian learning of ICA with missing data, Neural Comput”, vol. 15, no. 8, pp. 1991–2011,2003.
K. J. Cios, W. Pedrycz, ,and R. Swiniarski. “Data Mining Methods for Knowledge Discovery”,Norwell, MA: Kluwer,(1998).
K.Lakshminarayan, S. A. Harp and Samad. “Imputation of missing data in industrial databases, Appl. Intell., vol. 11, no. 3, pp. 259–275, Nov./Dec1999.
K.T. Chuang, K. P. Lin, and M. S. Chen. “Quality-Aware Sampling and Its Applications in Incremental Data Mining”, IEEE Transactions on knowledge and data engineering,vol.19, no. 4,2007.
Li.Liu, Y. Tu, Y. Li, and G. Zou. “Imputation for missing data and variance estimation when auxiliary information is incomplete”, Model Assisted Statistics and Applications, 83-94,2005.
M. J. Zaki, S. Parthasarathy, W. Lin, and M. Ogihara. “Evaluation of sampling for data mining of association rules”, Technical Report 617, University of Rochester, Rochester, NY,1996.
M. Zaki, S. Parthasarathy, W. Li and M. Ogihara. “Evaluation of Sampling for Data Mining of Association Rules”, Proc. Int’l Workshop Research Issues in Data Eng,1997.
Piatetsky, Shapiro and J.William, Frawley. “Knowledge discovery in databases”,AAAI Press/MIT Press,1991.
R. J. Little and D. B. Rubin. “Statistical Analysis With Missing Data”, Hoboken, NJ: Wiley, (1987).
R.Krishnamurthy, and T.Imielinski. “Research directions in Knowledge Discovery”, SIGMOD Record,20(3):76-78,1991.
S. Singh, and S. Horn. “Compromised imputation in survey sampling”, Metrika, 51, 266- 276,2000.
S.Singh. “A new method of imputation in survey sampling”, Statistics, Vol. 43, 5 , 499 – 511,2009.
U Fayyad, Piatetsky-Shapiro, P.Smyth. ”Knowledge discovery and data mining: Towards a unifying framework”,In Proceedings of the 2nd ACM international conference on knowledge discovery and data mining (KDD), Portland, OR, pp 82–88.1996.
V. N. Reddy. “A study on the use of prior knowledge on certain population parameters in estimation”, Sankhya, C, 40, 29-37,1978.
V. Tresp, R. Neuneier, and S. Ahmad. “Efficient methods for dealing with missing data in supervised learning”, in Advances in Neural Information Processing Systems 7, G. Cambridge, MA: MIT Press, pp. 689–696,1995.
W. G. Cochran. “Sampling Techniques”, John Wiley and Sons, New York, (2005).
W. S. Sarle. “Prediction with missing inputs”, in Proc. 4th JCIS, vol. 2, pp. 399–402,1998.
W. Zhang. “Association based multiple imputation in multivariate datasets”, A summary, in Proc. 16th ICDE, pp. 310–311,2000.
Y. Freund and R. E. Schapire. “Experiments with a new boosting algorithm”, in Proc. 13th Int. Conf. Mach. Learn., pp. 146–148,1996.
Associate Professor Dr. Diwakar Shukla
Dr. H.S.G. Central University, Sagar (M.P.), India. - India
Mr. Rahul Singhai
Devi Ahilya University - India