DOI QR코드

DOI QR Code

RFA: Recursive Feature Addition Algorithm for Machine Learning-Based Malware Classification

  • Byeon, Ji-Yun (Dept. of Cyber Security, Yeungnam University College) ;
  • Kim, Dae-Ho (Dept. of Cyber Security, Yeungnam University College) ;
  • Kim, Hee-Chul (Dept. of Cyber Security, Yeungnam University College) ;
  • Choi, Sang-Yong (Dept. of Cyber Security, Yeungnam University College)
  • Received : 2020.12.31
  • Accepted : 2021.01.28
  • Published : 2021.02.26

Abstract

Recently, various technologies that use machine learning to classify malicious code have been studied. In order to enhance the effectiveness of machine learning, it is most important to extract properties to identify malicious codes and normal binaries. In this paper, we propose a feature extraction method for use in machine learning using recursive methods. The proposed method selects the final feature using recursive methods for individual features to maximize the performance of machine learning. In detail, we use the method of extracting the best performing features among individual feature at each stage, and then combining the extracted features. We extract features with the proposed method and apply them to machine learning algorithms such as Decision Tree, SVM, Random Forest, and KNN, to validate that machine learning performance improves as the steps continue.

최근 악성코드와 정상 바이너리를 분류하기 위해 기계학습을 이용하는 기술이 다양하게 연구되고 있다. 효과적인 기계학습을 위해서는 악성코드와 정상 바이너리를 식별하기 위한 Feature를 잘 추출하는 것이 무엇보다 중요하다. 본 논문에서는 재귀적인 방법을 이용하여 기계학습에 활용하기 위한 Feature 추출 방법인 RFA(Recursive Feature Addition) 제안한다. 제안하는 방법은 기계학습의 성능을 극대화 하기 위해 개별 Feature를 대상으로 재귀적인 방법을 사용하여 최종 Feature Set을 선정한다. 세부적으로는 매 단계마다 개별 Feature 중 최고성능을 내는 Feature를 추출하여, 추출한 Feature를 결합하는 방법을 사용한다. 제안하는 방법을 활용하여 Decision tree, SVM, Random forest, KNN등의 기계학습 알고리즘에 적용한 결과 단계가 지속될수록 기계학습의 성능이 향상되는 것을 검증하였다.

Keywords

References

  1. ENISA Threat Landscape 2020, https://online.flippingbook.com/view/165705/, Oct. 2020.
  2. Dong-Geun Lee. "Analysis of Malware Detection Techniquesbased on Machine Learning." Graduate School of Soonchunhyang University, Feb. 2018
  3. El Merabet, Hoda, and Abderrahmane Hajraoui. "A survey of malware detection techniques based on machine learning." International Journal of Advanced Computer Science and Applications. Vol. 10 No. 1, pp. 366-373. 2019.
  4. Feature engineering, http://www.incodom.kr/%EA%B8%B0%EA%B3%84%ED%95%99%EC%8A%B5/feature_engineering.
  5. Decision Tree, https://ko.wikipedia.org/wiki/%EA%B2%B0%EC%A0%95_%ED%8A%B8%EB%A6%AC
  6. Random forest, https://ko.wikipedia.org/wiki/%EB%9E%9C%EB%8D%A4_%ED%8F%AC%EB%A0%88%EC%8A%A4%ED%8A%B8
  7. SVM, https://ko.wikipedia.org/wiki/%EC%84%9C%ED%8F%AC%ED%8A%B8_%EB%B2%A1%ED%84%B0_%EB%A8%B8%EC%8B%A0
  8. KNN, https://ko.wikipedia.org/wiki/K-%EC%B5%9C%EA%B7%BC%EC%A0%91_%EC%9D%B4%EC%9B%83_%EC%95%8C%EA%B3%A0%EB%A6%AC%EC%A6%98
  9. Woo-Seok Go, Chun-Gyeong Yoon, Han-Pil Rhee, Soon-Jin Hwang, Sang-Woo LEE, "A Study on the prediction of BMI(Benthic Macroinvertebrate Index) using Machine Learning Based CFS(Correlation-based Feature Selection) and Random Forest Model", Journal of Korean Society on Water Environment, Vol.35, No.5, pp.425-431, September, 2019. DOI:.10.15681/KSWE.2019.35.5.425
  10. Sung-Guk Choi, "A Study on the Prediction of Intrusion Types Using a Support Vector Machine", Yonsei University, Feb. 2016.
  11. Hong-bi Kim, Tae-jin Lee, "Stacked Autoencoder Based Malware Feature Refinement Technology Research", Journal of Korea Institute of Information Security & Cryptology, Vol.30, No.4, pp-593-603, Aug. 2020. DOI:10.13089/JKIISC.2020.30.4.593
  12. Seong-Min Jeong, Hyeon-Seok Kim, Young-Jae Kim, Myung-Keun Yoon, "V-gram: Malware Detection Using Opcode Basic Blocks and Deep Learning", Journal of KIISE, Vol.46, No.7, pp.599-605, July, 2019. DOI:10.5626/JOK.2019.46.7.599
  13. Jin-Young Cho, Eun-Gi Ko, Hye-Bin Yoo, Mi-Ri Cho, Chang-Jin Seo, "A Study on Malware Detection System Using Static Analysis and Stacking", The Transactions of the Korean Institute of Electrical Engineers, Vol.69P, No.3, pp.187-192, September, 2020. DOI:10.5370/KIEEP.2020.69.3.187
  14. Young-Min Cho, Hun-Yeong Kwon, "Machine Lerning Based Malware Detection Using API Call Time Interval", Journal of The Korea Institute of Information Security & Cryptology, Vol.30, No.1, pp.51-58, Feb, 2020. DOI:10.13089/JKIISC.2020.30.1.51
  15. Seong-Eun Kang, Nguyen Vu Long, Sou-hwan Jung, "Android Malware Detection Using Permission-Based Machine Learning Apporach", Journal of The Korea Institute of Information Security & Cryptology, Vol.28, No.3, pp.617-623, Jun, 2018. DOI:10.13089/JKIISC.2018.28.3.617