Home   >   CSC-OpenAccess Library   >    Manuscript Information
Full Text Available

This is an Open Access publication published under CSC-OpenAccess Policy.
Publications from CSC-OpenAccess Library are being accessed from over 74 countries worldwide.
Data Preparation and Reduction Technique in Intrusion Detection Systems: ANOVA-PCA
Mohammed Nasser Mohammed, Mussa Mohamed Ahmed
Pages - 167 - 182     |    Revised - 31-08-2019     |    Published - 01-10-2019
Volume - 13   Issue - 5    |    Publication Date - October 2019  Table of Contents
IDS, Supervised Classifiers, NOVA-PCA.
Intrusion detection system plays a main role in detecting anomaly and suspected behaviors in many organization environments. The detection process involves collecting and analyzing real traffic data which in heavy-loaded networks represents the most challenging aspect in designing efficient IDS.

Collected data should be prepared and reduced to enhance the classification accuracy and computation performance.

In this research, a proposed technique called, ANOVA-PCA, is applied on NSL-KDD dataset of 41 features which are reduced to 10. It is tested and evaluated with three types of supervised classifiers: k-nearest neighbor, decision tree, and random forest. Results are obtained using various performance measures, and they are compared with other feature selection algorithms such as neighbor component analysis (NCA) and ReliefF. Results showed that the proposed method was simple, faster in computation compared with others, and good classification accuracy of 98.9% was achieved.
1 Google Scholar 
2 refSeek 
3 Doc Player 
4 Scribd 
5 SlideShare 
1 J.P. Nziga. "Minimal Dataset for Network Intrusion via Dimensionality Reduction." Sixth International Conference on Digital Information Management ICDIM, 2011.
2 M. Tavallaee, E. Bagheri, Wei Lu, and Ali A. Ghorbani. "A Detailed Analysis of the KDD CUP 99 Data Set." IEEE Symposium on Computational Intelligence in Security and Defense Applications, 2009.
3 University of New Brunswick Canadian Institute for Cyber-Security "NSL-KDD Dataset." Internet: https://www.unb.ca/cic/datasets/nsl.html, Nov. 21, 2018.
4 D. H. Deshmukh, T. Ghorpade, P. Padiya. "Intrusion Detection System by Improved Preprocessing Methods and Naive Bayes Classifier using NSL-KDD99 Dataset." International Conference on Electronics and Communication Systems (ICECS), 2014.
5 A. Ozgur, H. Erdem. (2016, Apr 14). "A review of KDD99 dataset usage in intrusion detection and machine learning between 2010 and 2015." PeerJ Preprints. Available: https://peerj.com/preprints/1954/
6 Amrita & P Ahmed. (2012, Sep). "A Study of feature selection methods in Intrusion Detection System: A Survey", International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR) , ISSN 2249-6831 , Vol.2, Issue 3, pp. 1-25. Available: https://www.researchgate.net/publication/317543749
7 D. H. Deshmukh, T. Ghorpade, P. Padiya. "Improving Classification Using Preprocessing and Machine Learning Algorithms on NSL-KDD Dataset" International Conference on Communication, Information & Computing Technology (ICCICT),2015 Jan. 16-17, Mumbai, India.
8 B. Ingre, A. Yadav. "Performance Analysis of NSL-KDD dataset using ANN" International Conference on Signal Processing and Communication Engineering Systems,2015, pp. 93- 96.
9 Y. Bouzida, F. Cuppens, N. Cuppens-Boulahia and S. Gombault, (2004, Jan) "Efficient Intrusion Detection Using Principal Component Analysis" https://www.researchgate.net/publication/267821847_Efficient_Intrusion_Detection_Using_P rincipal_Component_Analysis, 2004, Jan.
10 G. Meena, R. R. Choudhary. "A Review Paper on IDS Classification using KDD99 and NSL-KDD Dataset in WEKA" International Conference on Computer, Communications and Electronics, July 01-02, 2017.
11 K. Ibrahimi, M. Ouaddane. "Management of Intrusion Detection Systems based-KDD99: Analysis with LDA and PCA" International Conference on Wireless Networks and Mobile Communications (WINCOM), Mar 17, 2017.
12 I. Z. Muttaqien, T. Ahmad. "Increasing Performance of IDS by Selecting and Transforming Features" IEEE International Conference on Communication, Networks and Satellite, 2016 Dec 8-10, Surabaya, Indonesia
13 N. Elssied, O. Ibrahim and A. Osman. (2014, Jan). "A Novel Feature Selection Based on One-Way ANOVA F-Test for E-Mail Spam Classification." Research Journal of Applied Sciences, Engineering and Technology 7(3): pp. 625-638
14 W. Yang, K. Wang, W. Zuo. (2012, Jan). "Neighborhood Component Feature Selection for High-Dimensional Data." Journal of Computers. Vol. 7, Number 1.
15 M. N. Abdullah and M. M. Ahmed. "Dataset Analysis and Preprocessing for Intrusion Detection Using Machine Learning Techniques." 3th Engineering Conference University of Aden- Faculty of Engineering, Mar 17-18, 2019, pp. 165-176.
16 H. G. Kayacik, N. Zincir-Heywood. "Generating Representative Traffic for Intrusion Detection System Benchmarking." 3rd Annual Communication Networks and Services Research Conference (CNSR'05) , May 16-18, 2005, pp. 4-5.
17 N. Paulauskas, J. Auskalnis. "Analysis of Data Pre-processing Influence on Intrusion Detection using NSL-KDD Dataset." Open Conference of Electrical, Electronic and Information Sciences (eStream), Jun 17, 2017 IEEE
18 L. Dhanabal, S.P. Shantharajah. "A Study on NSL-KDD Dataset for Intrusion Detection System Based on Classification Algorithms", International Journal of Advanced Research in Computer and Communication Engineering, Vol. 4, Issue 6, Jun, 2015.
19 W. L. Martinez, M. Cho. Statistics in Matlab A Primer. United Kingdom: Chapman & Hall/CRC, 2015, pp. 134.
20 M. F. Elrawi, T. K. Abdelhamid, and A. M. Mohamed. (2013, July). "IDS IN TELECOMMUNICATION NETWORK USING PCA." International Journal of Computer Networks & Communications. Vol.5, No.4. pp. 147-157.
21 T. Saito, M. Rehmsmeier. (2015,Mar 4). "The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets.": PLOS ONE , Available: https://drive.google.com/open?id=0Bx5AC3BOw_m7ODR2bzBlTTBtajg&authuser=0
22 M. Robnik-Sikonja, and I. Kononenko, (2003). "Theoretical and empirical analysis of ReliefF and RReliefF." Machine Learning Journal, pp. 23-69.
Mr. Mohammed Nasser Mohammed
Faculty of Engineering/ Information Technology, University of Aden - Yemen
Professor Mussa Mohamed Ahmed
Faculty of Engineering/ Electronics and Communication, University of Aden - Yemen