Features Reduction using Logistic Regression for Spam Filtering

로지스틱 회귀 분석을 이용한 스펨 필터링의 특징 축소

  • 정용규 (을지대학교 의료IT마케팅학과) ;
  • 이범준 (을지대학교 의료산업학부 의료전산학전공)
  • Received : 2010.04.01
  • Published : 2010.04.30

Abstract

Today, The much amount of spam that occupies the mail server and network storage occurs the lack of negative issues, such as overload, and for users to delete the spam should spend time, resources have a problem. Automatic spam filtering on the incidence to solve the problem is essential. A lot of Spam filters have tried to solve the problem emerged as an essential element automatically. Unlike traditional method such as Naive Bayesian, PCA through the many-dimensional data set of spam with a few spindle-dimensional process that narrowed the operation to reduce the burden on certain groups for classification Logistic regression analysis method was used to filter the spam. Through the speed and performance, it was able to get the positive results.

오늘날의 스팸 메일이 메일 서버와 네트워크 저장장치의 대부분을 차지함으로 인해 네트워크 부하와 같은 부정적인 문제가 발생하고 있으며 사용자 입장에서는 스팸을 삭제하기 위한 시간과 자원 소모 같은 문제를 가지고 있다. 자동 스팸 메일 필터링은 문제 해결위한 필수적인 요소로 부각 되었다. 대표적인 방법은 나이브 베이지안 방법과 달리 PCA를 통하여 많은 차원을 가지는 스팸 테이터 집합을 몇 개의 주축으로 차원을 축소 시켜 연차 처리의 부담을 줄이고 특정 집으로 분류를 위한 로지스틱 회귀 분석 방법을 사용하여 스팸 필터링을 하였다. 이를 통하여 속도와 성능 두가지의 성과를 얻을 수 있었다.

Keywords

References

  1. M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz, "A Bayesian Approach to Filtering Junk E-Mail," AAAI Technical Report WS-98-05, 1998
  2. Vikas P. Deshpande, Robert F. Erbacher, and Chris Harris" An Evaluation of Naïve Bayesian Anti-Spam Filtering Techniques" Proceedings of the 2007 IEEE Workshop on Information Assurance United States Military Academy, West Point, NY 20-22 June 2007
  3. Toby Segaran, "Programming collective intelligence", O'REILLY, 2007
  4. Ian H.Witten, Frank Eibe, "Data Mining: Practical Machine Learning Tools and Techniques" Morgan Kaufmann, 2000
  5. Pang-Ning Tan & Michael Steinbach & Vipin Kumar, "Introduction to Data Mining", ELSEVIER, 2006
  6. H. Drucker, D. Wu, and V. N. Vapnik., "Support Vector Machines for Spam Categorization", IEEE Trans. on Neural networks, 1999.
  7. D. Mertz, "Spam Filtering Techniques. Six approaches to eliminating unwanted e-mail.", Gnosis Software Inc., September, 2002. Ciencias Físicas, Universidad de Valencia, 1992.
  8. M. Vinther, "Junk Detection using neural networks", MeeSoft Technical Report, June 2002. Available: http://logicnet.dk/reports/ JunkDetection/JunkDetection.htm.
  9. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. "Indexing By Latent Semantic Analysis", Journal of the American Society For Information Science, 41, 391-407. (1990) https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
  10. Jiawei Han, Micheline Kamber, "Data mining - Concepts and Techniques", Morgan Kaufmann Publishers, 2001.