An Approach for Optimizing Ensemble Intrusion Detection Systems

An Approach for Optimizing Ensemble Intrusion Detection Systems

Abstract:

Intrusion Detection System is yet an interesting research topic. With a very large amount of traffic in real-time networks, feature selection techniques that are effectively able to find important and relevant features are required. Hence, the most important and relevant set of features is the key to improve the performance of intrusion detection system. This study aims to find the best relevant selected features that can be used as important features in a new IDS dataset. To achieve the aim, an approach for generating optimized ensemble IDS is developed. Six features selection methods are used and compared, i.e.: Information Gain (IG), Gain Ratio (GR), Symmetrical Uncertainty (SU), Relief-F (R-F), One-R (OR) and Chi-Square (CS). The feature selection techniques produce sets of selected features. Each best selected number of features that are obtained from feature ranking step for respective feature selection technique will be used to classify attacks via four classification methods, i.e.: Bayesian Network (BN), Naïve Bayesian (NB), Decision Tree: J48 and SOM. Then, each feature selection technique with its respective best features is combined with each classifier method to generate ensemble IDSs. Lastly, the ensemble IDSs are evaluated using Hold-up, K-fold validation approaches, as well as F-Measure and statistical validation approaches. Experimental results using Weka tools on ITD-UTM dataset show the optimized ensemble IDSs using (SU and BN); using (CS and BN) or (CS and SOM) or (IG and NB); and using (OR and BN) with respective ten, four and seven best selected features achieves 81.0316%, 85.2593%, and 80.8625% of accuracy, respectively. In addition, ensemble IDSs using (SU and BN) and using (OR and J48) with ten and six best respective selected features, perform the best F-measure value, i.e.: 0.853 and 0.830, respectively. Indirect comparison with other ensemble IDS on different dataset is discussed.