Exploration of PIM Based Similarity Measures with PMP as Association Rule Thresholds

부분 주변 비율을 고려한 확률적 흥미도 기반 유사성 측도의 연관성 평가 기준 활용 방안

  • Published : 2012.12.30

Abstract

Association rule mining is the method to quantify the relationship between a set of items in a large database, and has been applied in various fields like healthcare, insurance, education, and internet shopping mall. There are three primary measures for association rule, support and confidence and lift. Mostly we generate some association rules using confidence. Confidence is the most important measure of these measures, but it is an asymmetric measure and has only positive value. Thus we can face with difficult problems in generation of association rules. In this paper we apply the similarity measures by probabilistic interestingness measure with partially marginal proportions to find a solution to this problem. The comparative studies with support, two confidences, lift, and some similarity measures by probabilistic interestingness measure with partially marginal proportions are shown by numerical example. As the result, we knew that the similarity measures by probabilistic interestingness measure with partially marginal proportions could be seen the degree of association same as confidence. And we could confirm the direction of association because they had the sign of their values, and select the best similarity measure by probabilistic interestingness measure.

연관성 규칙 탐사는 방대한 양의 데이터베이스에 내재되어 있는 항목들 서로 간의 관련성을 파악하는 것으로 마케팅, 쇼핑몰, 보건 및 의료, 교육 분야 등 현업에 많이 적용되고 있다. 이러한 연관성 규칙을 탐사하기 위해 지지도, 신뢰도, 향상도 등의 연관성 규칙 평가 기준이 활용되고 있다. 이들 중에서 가장 중심이 되는 측도인 신뢰도는 항상 양의 값을 취하는 비대칭적 측도이기 때문에 항목 간에 연관성 규칙을 생성하는 데 여러 가지 어려움이 발생한다. 이러한 문제를 해결하기 위해 본 논문에서는 부분주변 비율을 고려한 확률적 흥미도 기반 유사성 측도를 연관성 평가 기준으로 활용하는 방안을 고려하였다. 그 결과, 부분주변 비율을 고려한 확률적 흥미도 기반 유사성 측도 모두가 기존의 연관성 평가 기준과 마찬가지로 연관성의 정도를 파악할 수 있는 동시에 부호를 포함하고 있어서 연관성의 방향도 알 수 있었다.

Keywords

References

  1. Agrawal, R., Imielinski, R., Swami, A. (1993). Mining association rules between sets of items in large databases, Proceedings of the ACM SIGMOD Conference on Management of Data, 207-216.
  2. Agrawal, R., Srikant, R. (1994). Fast algorithms for mining association rules, Proceedings of the 20th VLDB Conference, 487-499.
  3. Bayardo, R. J. (1998). Efficiently mining long patterns from databases, Proceedings of ACM SIGMOD Conference on Management of Data, 85-93.
  4. Cai, C. H., Fu, A. W. C., Cheng, C. H., Kwong, W. W. (1998). Mining association rules with weighted items, Proceedings of International Database Engineering and Applications Symposium, 68-77.
  5. Cho, K. H., Park, H. C. (2011a). Study on the multi intervening relation in association rules, Journal of the Korean Data Analysis Society, 13(1), 297-306.
  6. Cho, K. H., Park, H. C. (2011b). Discovery of insignificant association rules using external variable, Journal of the Korean Data Analysis Society, 13, 1343-1352.
  7. Cole, L. C. (1949). The measurement of interspecific association, Ecology, 30, 411-424. https://doi.org/10.2307/1932444
  8. Han, J., Fu, Y. (1995). Discovery of multiple-level association rules from large databases, Proceeding of the 21st VLDB Conference, 420-431.
  9. Han, J., Fu, Y. (1999). Mining multiple-level association rules in large databases, IEEE Transactions on Knowledge and Data Engineering, 11(5), 68-77.
  10. Han, J., Pei, J., Yin, Y. (2000). Mining frequent patterns without candidate generation, Proceedings of ACM SIGMOD Conference on Management of Data, 1-12.
  11. Imberman S., Domanski B., Thompson H. (2001). Boolean analyser - An algorithm that uses a probabilistic interestingness measure to find dependency/association rules in a head trauma data, Proceedings of Americas Conference on Information Systems, 369-375.
  12. Jin, D. S., Kang, C., Kim, K. K., Choi, S. B. (2011). CRM on travel agency using association rules, Journal of the Korean Data Analysis Society, 13(6), 2945-2952.
  13. Lee, K. W., Park, H. C. (2008). A study for statistical criterion in negative association rules using boolean analyzer, Journal of the Korean Data & Information Science Society, 19(2), 569-576.
  14. Loevinger, J. A. (1947). A systematic approach to the construction and evaluation of tests of ability, Psychometrika, Monograph, 61(4), 1-49. https://doi.org/10.1037/h0093593
  15. Loevinger, J. A. (1948). The technique of homogeneous tests compared with some aspects of scale analysis and factor analysis, Psychological Bulletin, 45, 507-530. https://doi.org/10.1037/h0055827
  16. Liu, B., Hsu, W., Ma, Y. (1999). Mining association rules with multiple minimum supports, Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining, 337-241.
  17. Mokken, R. J. (1971). A theory and procedure of scale analysis, The Hague : Mouton.
  18. Orchard, R. A. (1975). On the determination of relationships between computer system state variables, Bell Laboratories Technical Memorandum, January 15, 1975.
  19. Park, J. S., Chen, M. S., Philip, S. Y. (1995). An effective hash-based algorithms for mining association rules, Proceedings of ACM SIGMOD Conference on Management of Data, 175-186.
  20. Park, H. C. (2011). The application of some similarity measures to association rule thresholds, Journal of the Korean Data Analysis Society, 13, 1331-1342.
  21. Park, H. C. (2012). Exploration of symmetric similarity measures by conditional probabilities as association rule thresholds, Journal of the Korean Data Analysis Society, 14(2), 707-716.
  22. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L. (1999). Discovering frequent closed itemsets for association rules, Proceedings of the 7th International Conference on Database Theory, 398-416.
  23. Pei, J., Han, J., Mao, R. (2000). CLOSET: An efficient algorithm for mining frequent closed itemsets, Proceedings of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 21-30.
  24. Peirce, C. S. (1884). The numerical measure of the success of predictions, Science, 4, 453-454.
  25. Piatetsky-Shapiro, G. (1991). Discovery, analysis and presentation of strong rules, Knowledge Discovery in Databases, AAAI/MIT Press, 229-248.
  26. Sijtsma, K., Molenaar, I. W. (2002). Introduction to nonparametric item response theory, Thousand Oaks: Sage.
  27. Srikant, R., Agrawal, R. (1995). Mining generalized association rules, Proceedings of the 21st VLDB Conference, 407-419.
  28. Srinkant R., Vu Q., Agrawal, R. (1997). Mining association rules with item constraints, Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, 67-73.
  29. Toivonen, H. (1996). Sampling large database for association rules, Proceedings of the 22nd VLDB Conference, 134-145.
  30. Warrens, M. J. (2008). Similarity coefficients for binary data, properties of coefficients, oefficient matrices, multi-way metrics and multivariate coefficients, The Doctoral paper of Universiteit Leiden.