DOI QR코드

DOI QR Code

Design and Implementation of Web Mail Filtering Agent for Personalized Classification

개인화된 분류를 위한 웹 메일 필터링 에이전트

  • 정옥란 (이화여자대학교 대학원 컴퓨터학과) ;
  • 조동섭 (이화여자대학교 컴퓨터학과)
  • Published : 2003.12.01

Abstract

Many more use e-mail purely on a personal basis and the pool of e-mail users is growing daily. Also, the amount of mails, which are transmitted in electronic commerce, is getting more and more. Because of its convenience, a mass of spam mails is flooding everyday. And yet automated techniques for learning to filter e-mail have yet to significantly affect the e-mail market. This paper suggests Web Mail Filtering Agent for Personalized Classification, which automatically manages mails adjusting to the user. It is based on web mail, which can be logged in any time, any place and has no limitation in any system. In case new mails are received, it first makes some personal rules in use of the result of observation ; and based on the personal rules, it automatically classifies the mails into categories according to the contents of mails and saves the classified mails in the relevant folders or deletes the unnecessary mails and spam mails. And, we applied Bayesian Algorithm using Dynamic Threshold for our system's accuracy.

인터넷의 발달로 인하여 웹을 통한 문서 송수신이 많아지면서 이메일의 사용자도 기하급수적으로 늘어나고 있다. 또한 일반 사용자나 전자상거래에서 오가는 메일의 양도 갈수록 늘어나고 있다. 편리하다는 점을 이용해서 엄청난 양의 스팸 메일도 매일 같이 쏟아져 나오고 있다. 본 논문에서는 사용자 개인에 맞게 메일을 자동 관리해 주는 즉 개인화된 분류가 가능하고, 또 언제 어디서나 로그인이 가능한 웹 메일 기반인 웹 메일 필터링 에이전트(Web Mail Filtering Agent for Personalized Classification)를 제안한다. 새로운 메일이 오면, 먼저 사용자의 메일 처리과정을 일정 기간 관찰하여 각각 개인에 맞는 룰(Personal rule)을 형성하고, 만들어진 룰을 바탕으로 메시지를 자동 관리 즉 카테고리별 분류ㆍ저장 및 개인에게 불필요한 메일이나 스팸 메일을 삭제 해 주는 것이다. 또한 시스템의 정확도를 높이기 위해 동적 임계치를 이용한 베이지안 알고리즘을 적용하였다.

Keywords

References

  1. David Wood. 최규혁역, 'Internet e-mail programming,' 한빛미디어, 2000
  2. Dunja Mladenic, Marko Grobelnik, 'Feature selection for classification based on text hierarchy,' Proc. of the workshop on Learning for Text and the Web, Pittsburgh, USA, 1998
  3. George H. John, Ron Kohavi, Karl Rfleger, 'Irrelevant Features and the Subset Selection Problem,' Proc. of ICML 94, Morgan Kaufmann Publishers, San Francisco., CA, pp.121-129, 1994
  4. Ian H. witten and Eibe Frank, Data Mining, Morgan Kaufmann Publishers, Inc., 2000
  5. Yiming Yang, Jan O. Pedersen, 'A Comparative Study on Feature Selection in Text Categorization,' Proc. of ICML97, pp.412-420, 1997
  6. 백혜정, 박영택, 윤석환, '사용자 관심도를 이용한 웹 에이전트', 정보처리학회지, 1999, http://sslab1.chosun.ac.kr/~chaehwan/study/agent/makeagent_favoriate.htm
  7. Jeffrey M. Bradshaw, 'Software agent,' AAAI Press/ The MIT Press, pp.151-161
  8. 이상섭, 오재준, 박영택, '웹 에이전트 핵심 기술', http://member:tripod.lycos.co.kr/ironjohn/agent/agent.html
  9. Andrew D. May, 'Automatic Classification of E-mail Message by Message Type,' Journal of the American Society for Information Science, 48(1), pp.32-39, 1997 https://doi.org/10.1002/(SICI)1097-4571(199701)48:1<32::AID-ASI5>3.0.CO;2-2
  10. Ricardo Baeza-Yates, Berthier Ribeiro-Neto, 'Modern Information Retrieval,' Addison-wesley, 1999
  11. Willian W.Cohen, 'Learning Rules that Classify E-Mail,' AAAI Spring symposium on Machine Learning in Information Access, pp.18-25, 1996
  12. McCallum, A. Nigam, 'A Comparison of Event Models for Naive Bayes Text Classification,' In AAAI-98 Workshop on Learning for Text Categorization, 1998, http://www.cs.cmu.edu/~mccallum
  13. P. Maes, 'Agent that Reduce Work and Information Overload,' Communications of the ACM, Vol.37, No.7, pp.30-40, 1994 https://doi.org/10.1145/176789.176792
  14. P. Resnic, N. Iacocou, M. Sushak, P. Bergstrom and J. Riedl, 'groupLens : An Open Architecture for Collaborative Filtering of Netnews,' Proceedings of the American Association of Artificial Intelligence, pp.439-446, 1999
  15. D. Golfberg, D. Nichols, B. M. Oki and D. Terry, 'Using Collaborative Filtering to Weaves an Information TAPE STRY,' Communications of the ACM, Vol.35, No.12, pp.61-70, 1992 https://doi.org/10.1145/138859.138867
  16. B. Mirkin, 'Mathematical Classification and Clustering,' Kluwer Academic Publisher, p.428, 1996
  17. K. Alsabi, S. Ranka and V. Singh, 'An Efficient K-Means Clustering Algorithm,' IPPS/SPDP Workshop on High Performance Data Mining, Orlando, 1998
  18. L. Kaufman and P. J. Rousseeuw, 'Finding Groups in Data : an Introduction to Cluster Analysis,' Wiley Series in Probability and Mathematical Statistics, p.342, 1990
  19. S. Sol and G. Berznieks, 'CGI/PERL : Web Scripts,' M&T Books, 1997
  20. W. Stallings, 'Network and Internetwork Security : Principles and Practices,' Prentice Hall, 1995