DOI QR코드

DOI QR Code

The Malware Detection Using Deep Learning based R-CNN

딥러닝 기반의 R-CNN을 이용한 악성코드 탐지 기법

  • Cho, Young-Bok (Department of Computer & Information Security, Daejeon University)
  • 조영복 (대전대학교 정보보안학과)
  • Received : 2018.05.20
  • Accepted : 2018.06.25
  • Published : 2018.06.30

Abstract

Recent developments in machine learning have attracted a lot of attention for techniques such as machine learning and deep learning that implement artificial intelligence. In this paper, binary malicious code using deep learning based R-CNN is imaged and the feature is extracted from the image to classify the family. In this paper, two steps are used in deep learning to image malicious code using CNN. And classify the characteristics of the family of malicious codes using R-CNN. Generate malicious code as an image, extract features, classify the family, and automatically classify the evolution of malicious code. The detection rate of the proposed method is 93.4% and the accuracy is 98.6%. In addition, the CNN processing speed for image processing of malicious code is 23.3 ms, and the R-CNN processing speed is 4ms to classify one sample.

최근 기계학습의 발달로 인공지능을 구현하는 머신러닝과 딥러닝 같은 기술이 많은 관심을 받고 있다. 본 논문에서는 딥러닝 기반의 R-CNN을 이용한 바이너리 악성코드를 이미지화 하고 이미지에서 특징을 추출해 패밀리를 분류한다. 본 논문에서는 딥러닝에서 두 단계를 이용해 악성코드를 CNN을 이용해 이미지화하고, 악성코드의 패밀리가 갖는 특징을 R-CNN을 이용해 분류함으로 악성코드를 이미지화하여 특징을 분류하고 패밀리를 분류한 후 악성코드의 진화를 자동 분류한다. 제안 기법은 검출율이 93.4%로 우수한 탐지 성능을 보였고 정확도는 98.6%로 매우 높은 성능을 보였다. 또한 악성코드를 이미지화 하는 CNN 처리속도가 23.3ms, 하나의 샘플을 분류하기 위해서 R-CNN처리 속도는 4ms로 비교적 빠르게 악성코드를 판별하고 분류가 가능함을 실험을 통해 증명하였다.

Keywords

References

  1. Athiwaratkun,Ben, and Jack W, Stokes, "Malware classification with LSTM and GRU language models and a characterlevel CNN.", 2017, Available: https://www.microsoft.com/en-us/research/wp-content/uploads/2017/07/LstmGruCnnMalwareClassifier.pdf, 2017
  2. Seok,Seon-Hee and Kim,Ho-Won "Visualized malware classification based-on convolutional neural network". Journal of the Korea Institute of Information Security and Cryptology, vol.26, no. 1, p.197, Available: http://www.koreascience.or.kr/article/ArticleFullRecord.jsp?cn=JBBHCB_2016_v26n1_197, Feb. 2016 https://doi.org/10.13089/JKIISC.2016.26.1.197
  3. A. Test, "Malware Statistics.", Available: https://www.av-test.org/en/s tatistics/malware//,2015. September, 2015.
  4. Anderson,Hyrrum-S, Woodbridge,Jonathan and Filar,Bobby "DeepDGA: Adversarially-tuned domain generation and detection." In Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security, p.13. Vienna, Austria, 2016, Available: https://arxiv.org/pdf/1610.01969
  5. Razvan,Pascanu, Jack W,Stokes, Hermineh Sanossian, Mady Marinescu, Anil Thomas, "Malware classification with recurrent networks." in Proceeding of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, p. 1916, Queensland, Australia. Available: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/pascanuIcassp2015.pdf, 2015
  6. Jack W.Stokes, De.Wang, Mady Marinescu, Marc Marino,and Brian Bussone, "Attack and Defense of Dynamic Analysis-Based, Adversarial Neural Malware Classification Models." Journal of the Cryptography and Security, Available: https://arxiv.org/pdf/1712.05919, 2017.
  7. Cho, Young-Bok, Woo,Sung-Hee, Lee, Sang-Ho and Han, Chang-Su, "CUDA based Medical Image High Speed Processing Algorithm," in Proceeding of the 2017 International Conference on Future Information Communication Engineering, vol 9, no.1, p. 213, Russia, 2017, Available: http://www.dbpia.co.kr/Journal/ArticleDetail/NODE07203503
  8. Giambattista Parascandolo, Heikki Huttunen and Tuomas Virtanen, "Recurrent neural networks for polyphonic sound event detection in real life recordings", in Proceeding of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, p. 6440, Shanghai, China, 2016, Available: https://arxiv.org/pdf/1604.00861
  9. Cho,Young-Bok, Woo,Sung-Hee and Lee,Sang-Ho,"Security Issues Using Remote Medical Treatment in Health Care In formation," in Proceeding of the 2014 International Conference On Future Information & Communication Engineering, vol. 6, no. 1, p.193, 2014. Available: http://www.dbpia.co.kr/Journal/ArticleDetail/NODE07221599
  10. Cho,Young-Bok, Woo,Sung-Hee and Lee,Sang-Hoo "Genetic lesion matching algorithm using medical image", Journal of the Korea Institute of Information and Communication Engineering, vol. 21, no.5.p.960, May 2017 https://doi.org/10.6109/jkiice.2017.21.5.960
  11. Heaton,J.B,Polson,N.G and Witte,J.H "Deep learning for finance: deep portfolios." Journal of the Applied Stochastic Models in Business and Industry, vol. 33, no. 1, p.3, October 2016, Available: https://doi.org/10.1002/asmb.2209
  12. Yoon,Hye-Jin, Kim,Chang-Sik, Kwahk Kee-Young,"Research Trends Investigation Using Text Mining Techniques: Focusing on Social Network Services", Journal of Digital Content Society(JDCS), Vol. 19, No. 3, March. 2018, Available: http://www.dbpia.co.kr/Journal/ArticleDetail/NODE07408880

Cited by

  1. 적층 콘볼루션 오토엔코더를 활용한 악성코드 탐지 기법 vol.20, pp.2, 2018, https://doi.org/10.7236/jiibc.2020.20.2.39