Home   >   CSC-OpenAccess Library   >    Manuscript Information
An Empirical Comparison of Supervised Learning Processes.
Sanjeev Manchanda, Mayank Dave, S. B. Singh
Pages - 21 - 38     |    Revised - 15-06-2007     |    Published - 30-06-2007
Volume - 1   Issue - 1    |    Publication Date - June 2007  Table of Contents
MORE INFORMATION
KEYWORDS
Data Mining, Knowledge Discovery in Databases, Supervised learning algorithms, Stacking
ABSTRACT
Data mining as a formal discipline is only two decades old, but it has registered phenomenal development and has become a mature discipline in this short span. In this paper, we present an empirical study of supervised learning processes based on empirical evaluation of different classification algorithms. We have included most of the supervised learning processes based on different pre pruning and post pruning criteria. We have included ten datasets, collected from internationally renowned agencies. Different specific models are presented and results are generated. Issues related to different processes are analyzed suitably. We also present a comparison of our study with benchmark results of different datasets and classification algorithms. We have presented results of all algorithms with fifteen different performance measures out of a set of twenty three calculated measures, making it a comprehensive study.
1 Google Scholar 
2 Academic Journals Database 
3 ScientificCommons 
4 Academic Index 
5 CiteSeerX 
6 refSeek 
7 Socol@r  
8 ResearchGATE 
9 Libsearch 
10 Bielefeld Academic Search Engine (BASE) 
11 Scribd 
12 WorldCat 
13 SlideShare 
14 PDFCAST 
15 PdfSR 
16 Chinese Directory Of Open Access 
33. Witten I. H. and Frank E. “Data Mining: Practical machine learning tools and techniques with java implementations”. Morgan Kaufmann, 2000
Atlas L., Connor J., and Park D. “A performance comparison of trained multi-layer perceptrons and trained classification trees”. In Systems, man and cybernetics: proceedings of the 1989 IEEE international conference, pages 915–920, Cambridge, Ma. Hyatt Regency, 1991
Ayer M., Brunk H., Ewing G., Reid W. & Silverman E. “An empirical distribution function for sampling with incomplete information”. Annals of Mathematical Statistics, 5, 641-647, 1955
Bauer E. and Kohavi R. “An empirical comparison of voting classification algorithms: Bagging, boosting, and variants”. Machine Learning, 36, 1999
Berry C. C. “The kappa statistic”. Journal of the American Medical Association, Linguistics (COLING- 90), volume 2, pages 251-256, 1992
Blake C. and Merz C., UCI repository of machine learning databases, 1998
Breiman L., Friedman J. H., Olshen R. A. and Stone C. J. “Classification and Regression Trees”. Wadsworth and Brooks, Monterey, CA., 1984
Caruana Rich and Niculescu-Mizil Alexandru. “An Empirical Comparison of Supervised Learning Algorithms”. Proceedings of the 23 rd International Conference on Machine Learning, Pittsburgh, PA, 2006
Cooper G. F., Aliferis C. F., Ambrosino R., Aronis J., Buchanan B. G., Caruana R., Fine M. J., Glymour C., Gordon G., Hanusa B. H., Janosky J. E., Meek C., Mitchell T., Richardson T. and Spirtes P. “An evaluation of machine learning methods for predicting pneumonia mortality”. Artificial Intelligence in Medicine, 9, 1997
Fahrmeir, L., Haussler, W., and Tutz, G. “Diskriminanz analyse”. In Fahrmeir, L. and Hamerle, A., editors, Multivariate statistische Verfahren. Verlag de Gruyter, Berlin, 1984
Fayyad U., Piatetsky-Shapiro G. and P. Smyth. “The KDD process for extracting useful knowledge from volumes of data”. CACM 39 (11), pp. 27-34, 1996
Friedman J., Hastie T. and Tibshirani R. “Additive Logistic Regression: a Statistical View of Boosting”. Stanford University,1998
Giudici P. “Applied data mining”. John Wiley and Sons. New York, 2003
Gorman R. P. and Sejnowski T. J. “Analysis of hidden units in a layered network trained to classify sonar targets”. Neural networks, 1 (Part 1):75–89, 1988
Hofmann H. J. “Die anwendung des cart-verfahrens zur statistischen bonitatsanalyse von konsumentenkrediten”. Zeitschrift fur Betriebswirtschaft, 60:941–962, 1990
King R., Feng C. and Shutherland A. “Statlog: comparison of classi_cation algorithms on large real world problems”. Applied Artificial Intelligence, 9, 1995
Kirkwood C., Andrews B. and Mowforth P. “Automatic detection of gait events: a case study using inductive learning techniques”. Journal of biomedical engineering, 11(23):511–516, 1989
Komarek P., Gray A., Liu T. and Moore A. “High Dimensional Probabilistic Classification for Drug Discovery”, Biostatics, COMPSTAT, 2004
LeCun Y., Jackel L. D., Bottou L., Brunot A., Cortes C., Denker J. S., Drucker H., Guyon I., Muller U. A., Sackinger E., Simard P. and Vapnik V. “Comparison of learning algorithms for handwritten digit recognition”. International Conference on Artificial Neural Networks (pp. 53{60).Paris, 1995
Lim T. S., Loh W.-Y. and Shih Y. S. “A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms”. Machine Learning, 40, 203-228, 2000
Mitchell T., Buchanan B., DeJon G., Dietterich T., Rosenbloom P. and Waibel A. "Machine Learning". Annual Review of Computer Science, vol. 4, pp. 417-433, 1990
Niculescu-Mizil A. and Caruana R. “Predicting good probabilities with supervised learning”. Proc. 22nd International Conference on Machine Learning (ICML'05), 2005
Nishisato S. “Analysis of Categorical Data: Dual Scaling and its Applications”. University of Toronto Press, Toronto, 1980
Perlich C., Provost F. and Simono J. S. “Tree induction vs. logistic regression: a learning-curve analysis”. J. Mach. Learn. Res., 4, 211-255, 2003
Platt J. “Probabilistic outputs for support vector machines and comparison to regularized likelihood methods”. Adv. in Large Margin Classifiers, 1999
Provost F. and Domingos P. “Tree induction for probability-based rankings”. Machine Learning, 2003
Provost F., Jensen D. and Oates T. “Efficient progressive sampling”. Fifth ACM SIGKDD, International Conference on Knowledge Discovery and Data Mining. San Diego, USA. 1999
Provost Foster J. and Kohavi Ron, “On Applied Research in Machine Learning”. Machine Learning 30 (2-3): 127-132, 1998
Ripley B. “Statistical aspects of neural networks”. Chaos and Networks - Statistical and Probabilistic Aspects. Chapman and Hall, 1993
Robertson T., Wright F. and Dykstra R. “Order restricted statistical inference”. John Wiley and Sons, New York, 1988
Shadmehr R. and D’Argenio Z. “A comparison of a neural network based estimator and two statistical estimators in a sparse and noisy environment”. In IJCNN-90: proceedings of the international joint conference on neural networks, pages 289–292, Ann Arbor, MI. IEEE Neural Networks Council, 1990
Sonnenburg S, Rätsch G. and Schäfer C. “Learning interpretable SVMs for biological sequence classification”. Research in Computational Molecular Biology, Springer Verlag, pages 389-407, 2005
Spikovska L. and Reid M. B., “An empirical comparison of id3 and honns for distortion invariant object recognition”. In TAI-90: tools for artificial intelligence: proceedings of the 2nd international IEEE conference, Los Alamitos, CA. IEEE Computer Society Press, 1990
Yoav Freund, Robert E. Schapire. “Experiments with a new boosting algorithm”. Thirteenth International Conference on Machine Learning, San Francisco, 148-156, 1996
Zadrozny B. and Elkan C. “Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers”. ICML, 2001
Zadrozny B. and Elkan C. “Transforming classifier scores into accurate multi-class probability estimates”. KDD, 2002
Mr. Sanjeev Manchanda
- India
smanchanda@thapar.edu
Mr. Mayank Dave
- India
Mr. S. B. Singh
- India