aub logo
  • AUB Admission is ongoing for Spring - 2026 (January to April)  | To Apply Click Here
  • AUB International Conference on Higher Education and Sustainable Development (30 - 31 January 2026) | For details - Click Here
  • To verify your document please email us at verification@aub.ac.bd 
  •  *** www.aub.ac.bd is our only website. All other websites in the name of AUB are fake. So everyone is warned not to be deceived. 
aub logo white
AUBIC-2026 কুইজ প্রতিযোগিতা

Contact us

+8801678664413-19

aub_admin November 10, 2025 98 Views

Evaluating the Statistical Reliability of Machine Learning Intrusion Detection Models for Real Time High Traffic Networks

Md. Nazmus Salehin1,*, Mst. Irin Sultana2, Abdullah Rakib Akand3,*, Mostak Ahmed4, Syed Ali Haider5

1,2 Lecturer, Computer Science and Engineering (CSE), Bangladesh Army University of Engineering & Technology (BAUET).
3 Lecturer, Department of Computer Science and Engineering, Asian University of Bangladesh, Bangladesh.
4 Additional Director, ICT CELL, University of Dhaka, Bangladesh.
5 Software Engineer, Hajvery University, Lahore.

* Corresponding Authors: Md. Nazmus Salehin & Abdullah Rakib Akand

Full Paper PDF:
Journal: KZYJC | Volume: 40, Issue 10, 2025

Abstract

This study aimed to evaluate the statistical reliability of machine learning–based intrusion detection models in real-time high-traffic network environments. Traditional offline evaluations often overestimate model performance due to random data partitioning, while temporal and real-time assessments provide more operationally relevant insights. A quantitative experimental design was employed, using three benchmark intrusion detection datasets: CICIDS2017, UNSW-NB15, and CSE-CIC-IDS2018. Models evaluated included Random Forest, Gradient Boosting (XGBoost), Support Vector Machine, Deep Neural Networks, and Autoencoder-based anomaly detectors. Data preprocessing preserved temporal order, handled missing values, removed redundant features, and scaled numerical variables. Real-time streaming simulations with sliding windows captured performance under concept drift and high-traffic conditions. Performance metrics included accuracy, precision, recall, F1-score, AUC, Brier Score, and Expected Calibration Error, with statistical reliability assessed via bootstrap confidence intervals and drift detection. Offline evaluation showed that XGBoost achieved the highest accuracy (0.989) and AUC (0.995), followed by Random Forest (accuracy 0.985, AUC 0.992). Real-time evaluation revealed lower mean F1-scores and higher variance, with XGBoost maintaining the highest mean F1-score (0.956 ±0.015) and lowest Brier Score (0.033) and ECE (2.9%), indicating superior calibration and stability. SVM and Autoencoder displayed the highest sensitivity to drift, with 11–13 ADWIN alerts and increased performance variability. Bootstrap confidence intervals further highlighted that offline superiority does not always translate to real time reliability. It is concluded that the study demonstrates that statistically robust evaluation, including temporal validation, uncertainty quantification, and real-time simulation, is essential for assessing the operational effectiveness of machine learning intrusion detection systems. XGBoost and ensemble models showed superior stability and calibration, whereas SVM and Autoencoder were less reliable under dynamic high-traffic conditions. These findings emphasize the need for adaptive learning, drift monitoring, and statistically rigorous evaluation in real-world cybersecurity applications.
KEYWORDS: Intrusion Detection Systems, Machine Learning, Statistical Reliability, High-Traffic Networks, Real-Time Evaluation, Concept Drift, Calibration, Bootstrap Confidence Intervals