DOI QR코드

DOI QR Code

Traffic Data Generation Technique for Improving Network Attack Detection Using Deep Learning

네트워크 공격 탐지 성능향상을 위한 딥러닝을 이용한 트래픽 데이터 생성 연구

  • Lee, Wooho (Interdisciplinary Program of Information Security, Chonnam National University) ;
  • Hahm, Jaegyoon (Div. of National Supercomputing, Korea Institute of Science and Technology Information) ;
  • Jung, Hyun Mi (Div. of National Supercomputing, Korea Institute of Science and Technology Information) ;
  • Jeong, Kimoon (Div. of National Supercomputing, Korea Institute of Science and Technology Information)
  • 이우호 (전남대학교 정보보안협동과정) ;
  • 함재균 (한국과학기술정보연구원 슈퍼컴퓨팅본부) ;
  • 정현미 (한국과학기술정보연구원 슈퍼컴퓨팅본부) ;
  • 정기문 (한국과학기술정보연구원 슈퍼컴퓨팅본부)
  • Received : 2019.09.27
  • Accepted : 2019.11.20
  • Published : 2019.11.28

Abstract

Recently, various approaches to detect network attacks using machine learning have been studied and are being applied to detect new attacks and to increase precision. However, the machine learning method is dependent on feature extraction and takes a long time and complexity. It also has limitation of performace due to learning data imbalance. In this study, we propose a method to solve the degradation of classification performance due to imbalance of learning data among the limit points of detection system. To do this, we generate data using Generative Adversarial Networks (GANs) and propose a classification method using Convolutional Neural Networks (CNNs). Through this approach, we can confirm that the accuracy is improved when applied to the NSL-KDD and UNSW-NB15 datasets.

네트워크 공격을 탐지하기 위하여 기계학습을 이용한 다양한 연구가 최근 급격히 증가하고 있다. 이러한 기계학습 방법은 많은 데이터에 의존적이며 연구를 위해 다양한 실험 데이터가 공개되어 사용되고 있다. 하지만 실험 데이터 및 실제 환경에서 수집되는 데이터는 class간의 수량이 불균형하다는 문제점을 가지고 있다. 본 연구에서는 기계 학습을 이용한 침입탐지시스템의 한계점 중 학습데이터의 class간 불균형으로 인한 분류 성능 저하를 해결하기 위한 방법을 제안한다. 이를 위해 네트워크 트래픽 데이터를 처리하고 seqGAN를 이용하여 부족한 데이터를 생성하였다. 제안된 방법은 NSL-KDD, UNSW-NB15 데이터 셋을 대상으로 Text-CNN을 이용하여 분류하는 테스트를 실행한 결과 정밀도가 향상되는 것을 확인할 수 있었다.

Keywords

References

  1. K. Wang & S.J. Stolfo. (2004, September). Anomalous payload-based network intrusion detection. RAID. (pp. 203-222). Berlin : Springer.
  2. N. Williams, S. Zander & G. Armitage. (2006). A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification. ACM SIGCOMM Compute Commun, Rev, 36(5), 5-16. https://doi.org/10.1145/1163593.1163596
  3. UCI KDD Archive. (2005) kdd aRCHIVE. KDDcup99 dataset. KDD [Online]. https://kdd.ics.uci.edu/databases/kddcup99/task.html
  4. L. Dhanabal & S. P. Shantharajah. (2015). A Study on NSL-KDD Dataset for Intrusion Detection System Based on Classification Algorithms. International Journal of Advanced Research in Computer and Engineering, 4(6), 446-452.
  5. N. V. Chawla et al. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research. 16, 321-357. https://doi.org/10.1613/jair.953
  6. S. Hu et al. (2009). MSMOTE: Improving classification performance when training data is imbalanced. 2009 Second international workshop on computer science and engineering, (2, pp.13-17). IEEE.
  7. L. Yu et al. (2017). Seqgan: Sequence generative adversarial nets with policy gradient. Thirty-First AAAI Conference on Artificial Intelligence.
  8. N. Moustafa & J. Slay. (2015). UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). Military communications and information systems conference(MilCIS), IEEE.
  9. B. Dong & X. Wang. (2016). Comparison deep learning method to traditional methods using for network intrusion detection. 2016 8th IEEE International Conference on Communication Software and Networks(ICCSN), (pp.581-585). IEEE.
  10. M. Lopez-Martin, B. Carro, A. Sanchez-Esguevillas & J. Lloret. (2017). Network traffic classifier with convolutional and recurrent neural networks for internet of things. IEEE Access, 5, 18042-18050. https://doi.org/10.1109/ACCESS.2017.2747560
  11. R. K. Rahul et al. (2017). Deep learning for network flow analysis and malware classification. International Symposium on Security in Computing and Communication. Singapore : Springer.
  12. T. Auld, A. W. Moore & S. F. Gull. (2007). Bayesian neural networks for internet traffic classification. IEEE Transactions on Neural Networks, 18(1), 223-239. https://doi.org/10.1109/TNN.2006.883010
  13. W. WANG et al. (2017). Malware traffic classification using convolutional neural network for representation learning. 2017 International Conference on Information Networking(ICOIN), (pp. 712-717). IEEE.
  14. T. Mikolov, K. Chen, G. Corrado & J. Dean. (2013). Efficient estimation of word representations in vector space. arXiv preprint.
  15. V. Nair & G. E. Hinton (2010). Rectified linear units improve restricted boltzmann machines. Proceedings of the 27th International Conference on Machine Learning(ICML-10), (pp. 807-814).
  16. Z. Zivkovic. (2004, August). Improved adaptive Gaussian mixture model for background subtraction. ICPR, (2, pp. 28-31), IEEE.
  17. X. Zhang, J. Zhao & Y. LeCun. (2015). Character-level convolutional networks for text classification. Advances in neural information processing systems. (pp. 649-657).