EXPLORE PUBLICATIONS BY COUNTRIES


	EUROPE

	MIDDLE EAST

	ASIA

	AFRICA
.............................

	United States of America

	United Kingdom

	Canada

	Australia

	Italy

	France

	Brazil

	Germany

	Malaysia

	Turkey

	China

	Taiwan

	Japan

	Saudi Arabia

	Jordan

	Egypt

	United Arab Emirates

	India

	Nigeria

A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Categorical Data

Lukun Zheng

Pages - 1 - 12 | Revised - 31-01-2018 | Published - 30-04-2018

Published in International Journal of Scientific and Statistical Computing (IJSSC)

Volume - 7 Issue - 1 | Publication Date - April 2018 Table of Contents

MORE INFORMATION

References | Abstracting & Indexing

KEYWORDS

Categorical Variable, Imputation Methods, Missing Value, Re-Imputation Accuracy Rate.

ABSTRACT

Missing data are often encountered in data sets and a common problem for researchers in different fields of research. There are many reasons why observations may have missing values. For instance, some respondents may not report some of the items for some reason. The existence of missing data brings difficulties to the conduct of statistical analyses, especially when there is a large fraction of data which are missing. Many methods have been developed for dealing with missing data, numeric or categorical. The performances of imputation methods on missing data are key in choosing which imputation method to use. They are usually evaluated on how the missing data method performs for inference about target parameters based on a statistical model. One important parameter is the expected imputation accuracy rate, which, however, relies heavily on the assumptions of missing data type and the imputation methods. For instance, it may require that the missing data is missing completely at random. The goal of the current study was to develop a two-step algorithm to evaluate the performances of imputation methods for missing categorical data. The evaluation is based on the re-imputation accuracy rate (RIAR) introduced in the current work. A simulation study based on real data is conducted to demonstrate how the evaluation algorithm works.

ABSTRACTING & INDEXING

1	Google Scholar

2	BibSonomy

3	Doc Player

4	Scribd

5	SlideShare

REFERENCES

A.B. Anderson, A. Basilevsky, and D.P.J. Hum. "Missing data: a review of the literature," in Handbook of Survey Research. New York: Academic Press, 1983, pp. 415-492.

D. J. Hand, H. J. Adér, and G. J. Mellenbergh. "Advising on Research Methods: A Consultant's Companion." Huizen, Netherlands: Johannes van Kessel. pp. 305-332, 2008.

D.B. Rubin. "Multiple imputation after 18+ years." J. Am. Stat. Assoc, vol. 91, pp. 473-489, 1996.

E. D. de Leeuw, J. Hox, and M. Husman. "Prevention and treatment of item nonresponse." Journal of Official Statistics, vol. 19, pp. 277-314, 2003.

I. Myrtverit and E. Stensrud. "Analyzing data sets with missing data: an empirical evaluation of imputation methods and likelihood-based methods." IEEE Transactions On Software Engineering, vol. 27, pp.999-1013, 2001.

J. Chen and J. Shao. "Jackknife variance estimation for nearest-neighbor imputation." J. Amer. Statist, Assoc, vol. 96, pp. 260-269, 2001.

J. Chen and J. Shao. "Nearest neighbor imputation for survey data." Journal of Official Statistics, vol. 16, pp. 113-131, 2000.

J. Fox, S. Weisberg, D. Adler, D. Bates, G. Baud-Bovy, S. Ellison and R. Heiberger. Package "car", Companion to Applied Regression. R Package version, 2-1, 2016.

J. L. Schafer and J. W. Graham. "Missing data: Our view of the state of the art." Psychological Methods, vol. 7, pp.147-177, 2002.

J. L. Schafer. Analysis of Incomplete Multivariate Data. Chapman and Hall, 1997.

J. R. Quinlan. C4.5: Programs for machine learning, Morgan Kaufman, Los Altos, CA, 1993.

L. Hurley. "Missing covariates in causal inference matching: Statistical imputation using machine learning and evolutionary search algorithms." Doctoral dissertation, Fordham University, 2017.

M.J. Rovine and M. Delaney. " Missing data estimation in developmental research," in Statistical Methods in Longitudinal Research: Principles and Structuring Change, A. Von Eye ed. 1, New York: Academic Press, pp. 35-79.

O. Troyanskaya, M. Cantor, and G. Sherlock. "Missing value estimation methods for DNA microarrays." Bioinformatics, vol. 17, pp. 520-525, 2001.

Q. Wang and J. Rao, "Empirical likelihood-based inferences in linear models with missing data." Scand. J. Statist, vol. 29, pp. 563-576, 2002.

R.J.A. Little. and D.B. Rubin. Statistical Analysis with Missing Data. New York: Wiley, 1987.

R.O. Duda and P.E. Hart. Pattern Classification and Scene Analysis, New York: Wiley, 1973.

R.S. Somasundaram and R. Nedunchezhian. "Evaluation of three simple imputation methods for enhancing preprocessing of data with missing values." International Journal of Computer Applications, vol. 21, pp. 14-19, 2011.

S. F. Messner. "Exploring the Consequences of Erratic Data Reporting for Cross- National Research on Homicide." Journal of Quantitative Criminology, vol. 8, pp.155-173, 1992.

S.C. Zhang, et al. "Optimized parameters for missing data imputation." PRICAI, vol. 6, pp. 1010-1016, 2006.

S.M. Chen and C.M. Huang. "Generating weighted fuzzy rules from relational database systems for estimating null values using genetic algorithms." IEEE Transactions on Fuzzy Systems, vol. 11, pp. 495-506, 2003.

W.H. Finch. "Imputation methods for missing categorical questionnaire data: a comparison of approaches." Journal of Data Science, vol. 8, pp. 361-378, 2010.

MANUSCRIPT AUTHORS

Dr. Lukun Zheng

Tennessee Technological University - United States of America

lzheng@tntech.edu

CREATE AUTHOR ACCOUNT

LAUNCH YOUR SPECIAL ISSUE

View all special issues >>

PUBLICATION VIDEOS