Development of Associative Rank Decision Function Using Basic Association Rule Thresholds

기본적인 연관기준값을 이용한 연관성 순위 결정 함수의 개발

  • Published : 2010.04.30

Abstract

Data mining is the method to find useful information for large amounts of data in database. One of the well-studied problems in data mining is the search for association rules. The task of association rule mining is to find certain association relationships among a set of data items in a database. There are three primary quality measures for association rule, support and confidence and lift. Given a user defined minimum support and minimum confidence threshold, association rule mining is to find all the rules having at least minimum support and minimum confidence. In this paper we developed associative decision function to generate association rule for items satisfying at least one of three criteria. We compared our function with the function suggested by Wu et al.(2004) by examples. As the result, our decision function was better than the function of Wu's function because our function had a value between -1 and 1 regardless of the range for three association thresholds. Our function had the value of 1 if all of three thresholds were greater than criteria and had the value of -1 if all of three thresholds were smaller than criteria.

데이터마이닝(data mining)은 방대한 양의 데이터베이스에 있는 쉽게 드러나지 않는 유용한 정보를 찾아내는 과정이며, 가장 많은 연구가 이루어지고 있는 분야는 연관성 규칙을 찾는 것이다. 연관성 규칙은 각 항목들 간의 관련성을 찾아내는 데 활용되며, 지지도, 신뢰도, 향상도 등을 기반으로 두 항목간의 관계를 수치화함으로써 의미 있는 규칙을 찾아내는 데이터마이닝 기법 중의 하나이다. 본 논문에서는 기본적인 3개의 흥미도 측도가 모두 연관성 기준값에는 미치지는 못하나 이들 중 어느 하나라도 기준값 이상이 되면 이들에 대해 순위를 매겨 연관성 규칙으로 생성하기 위한 연관성 의사 결정 함수를 개발하고자 한다. 모의실험결과를 전체적으로 살펴보면 본 논문에서 제시한 함수는 최저 연관성 기준값들의 범위와는 관계없이 항상 -1과 1 사이의 값을 가지며, 연관성 기준값들이 모두 충족되면 1의 값을 가지며, 모두가 충족되지 않으면 -1의 값을 갖게 된다는 사실을 알 수 있었다.

Keywords

References

  1. 강현철, 한상태, 정병철, 신연주 (2004). 개인화를 위한 추천시스템 알고리즘에 관한 연구, Journal of the Korean Data Analysis Society, Vol. 6, No. 4, pp. 1043-1049.
  2. 김민환, 박희창 (2007). 연관성 규칙을 이용한 왜곡변수 발견에 관한 연구, Journal of the Korean Data Analysis Society, Vol. 8, No. 2, pp. 711-719.
  3. 박희창, 조광현 (2005). 연관성규칙을 이용한 지역정보와 통합된 폐기물 데이터 분석, Journal of the Korean Data Analysis Society, Vol. 7, No. 3, pp. 763-772.
  4. 조광현, 박희창 (2007). 연관성 발견을 위한 군집분석의 적용 방안, Journal of the Korean Data Analysis Society, Vol. 9, No. 6, pp. 2919-2930.
  5. Agrawal, R., Imielinski, R. and Swami, A. (1993). Mining association rules between sets of items in large databases, Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 207-216.
  6. Agrawal, R. and Srikant, R. (1994). Fast algorithms for mining association rules, Proceedings of the 20th VLDB Conference, pp. 487-499.
  7. Bing, Liu, B., Hsu, W., Chen, S., and Ma, Y. (2000). Analyzing the Subjective Interestingness of Association Rules, IEEE Intelligent Systems, Vol. 15 , No. 5, pp 47-55. https://doi.org/10.1109/5254.889106
  8. Cai, C. H., Fu, A. W. C., Cheng, C. H. and Kwong, W. W. (1998). Mining association rules with weighted items, Proceedings of International Database Engineering and Applications Symposium, pp. 68-77.
  9. Freitas, A. (1999). On rule interestingness measures, Knowledge-based System, Vol. 12, pp 309-315. https://doi.org/10.1016/S0950-7051(99)00019-2
  10. Han, J. and Fu, Y. (1999). Mining multiple-level association rules in large databases, IEEE Transactions on Knowledge and Data Engineering, Vol. 11, No. 5, pp. 68-77.
  11. Han, J., Pei, J. and Yin, Y. (2000). Mining frequent patterns without candidate generation, Proceedings of ACM SIGMOD Conference on Management of Data, pp. 1-12.
  12. Hilderman, R. J. and Hamilton, H. J. (2000). Applying Objective Interestingness Measures in Data Mining Systems, Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pp. 432-439.
  13. Liu, B., Hsu, W. and Ma, Y. (1999). Mining association rules with multiple minimum supports, Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining, pp. 337-241.
  14. Park, J. S., Chen, M. S., and Philip, S. Y. (1995). An effective hash-based algorithms for mining association rules, Proceedings of ACM SIGMOD Conference on Management of Data, pp. 175-186.
  15. Pasquier, N., Bastide, Y., Taouil, R. and Lakhal, L. (1999). Discovering frequent closed itemsets for association rules, Proceedings of the 7th International Conference on Database Theory, pp. 398-416.
  16. Pei, J., Han, J., and Mao, R. (2000). CLOSET: An efficient algorithm for mining frequent closed itemsets, Proceedings of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 21-30.
  17. Silberschatz, A. and Tuzhilin, A. (1996). What makes patterns interesting in knowledge discovery systems, IEEE transactions on Knowledge Data Engineering, Vol. 8, pp 970-974. https://doi.org/10.1109/69.553165
  18. Tan, P. N., Kumar, V., and Srivastava, J. (2002). Selecting the right interestingness measure for association patterns, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 32-41.
  19. Wu, X., Zhang, C., and Zhang, S. (2004). Eficient Mining of Both Positive and Negative Association Rules, ACM Transactions on Information Systems, Vol. 22, No. 3, pp. 381-405. https://doi.org/10.1145/1010614.1010616