DOI QR코드

DOI QR Code

Optimizing Similarity Threshold and Coverage of CBR

사례기반추론의 유사 임계치 및 커버리지 최적화

  • Received : 2013.05.14
  • Accepted : 2013.05.31
  • Published : 2013.08.31

Abstract

Since case-based reasoning(CBR) has many advantages, it has been used for supporting decision making in various areas including medical checkup, production planning, customer classification, and so on. However, there are several factors to be set by heuristics when designing effective CBR systems. Among these factors, this study addresses the issue of selecting appropriate neighbors in case retrieval step. As the criterion for selecting appropriate neighbors, conventional studies have used the preset number of neighbors to combine(i.e. k of k-nearest neighbor), or the relative portion of the maximum similarity. However, this study proposes to use the absolute similarity threshold varying from 0 to 1, as the criterion for selecting appropriate neighbors to combine. In this case, too small similarity threshold value may make the model rarely produce the solution. To avoid this, we propose to adopt the coverage, which implies the ratio of the cases in which solutions are produced over the total number of the training cases, and to set it as the constraint when optimizing the similarity threshold. To validate the usefulness of the proposed model, we applied it to a real-world target marketing case of an online shopping mall in Korea. As a result, we found that the proposed model might significantly improve the performance of CBR.

사례기반추론(CBR)은 많은 장점으로 인해 지금까지 의료진단, 생산계획, 고객분류 등 다양한 분야의 의사결정 지원에 적용되어 왔다. 그러나, 효과적인 CBR 시스템을 설계, 구축하기 위해서는 연구자가 직관적으로 설정해야 할 많은 설계요소들이 존재한다. 본 연구에서는 이러한 CBR의 여러 설계요소들 중 사례 검색 단계에서 결합할 이웃 사례들을 보다 효과적으로 선정할 수 있는 새로운 모형을 제시한다. 기존 연구에서는 결합할 이웃 사례를 선정하는 방법으로 사전에 정해진 이웃사례의 수(k-NN의 k)를 적용하든가, 혹은 최대 유사도의 상대적 비율을 임계치로 사용하는 방식을 적용해 왔다. 하지만, 본 연구에서는 결합할 유사사례를 선택하는 새로운 기준으로 0에서 1사이의 값을 갖는 절대적 유사 임계치를 사용할 것을 제안한다. 이 경우, 임계치 값이 과도하게 작아지게 되면, 예측결과의 생성이 잘 이루어지지 않을 수 있는 문제가 발생할 수 있다. 이에, 전체 학습사례들 중에서 예측결과가 생성된 사례의 비중을 커버리지(coverage)로 정의하고, 이를 유사 임계치 최적화 시 제약조건으로 설정함으로서, 사용자가 원하는 수준의 커버리지는 유지한 상태에서 가장 효과적인 유사 사례를 찾아 추론할 수 있도록 모형을 설계하였다. 제안 모형의 유용성을 검증하기 위해, 본 연구에서는 이 모형을 실존하는 국내 한 온라인 쇼핑몰의 표적 마케팅 사례에 적용하였다. 그 결과, 제안 모형이 CBR의 예측 성과를 유의미하게 개선시킬 수 있음을 확인할 수 있었다.

Keywords

References

  1. H. Ahn, K.-j. Kim, and I. Han, "Simultaneous Optimization Model of Case-Based Reasoning for Effective Customer Relationship Management," Journal of Intelligence and Information Systems, Vol.11, No.2, pp.175-195, 2005.
  2. C. Chiu, "A case-based customer classification approach for direct marketing," Expert Systems with Applications, Vol.22, No.2, pp.163-168, 2002. https://doi.org/10.1016/S0957-4174(01)00052-5
  3. G.-H. Lee and D.-H. Lee, "Recommending System of Products on e-shopping malls based on CBR and RBR," The KIPS Transactions: Part D, Vol.11D, No.5, pp.1189-1196, 2004. https://doi.org/10.3745/KIPSTD.2004.11D.5.1189
  4. V. Kumar and W. J. Reinartz, Customer Relationship Management: A Databased Approach, New Jersey: John Wiley & Sons, 2006.
  5. H. Ahn, K.-j. Kim, and I. Han, "Hybrid Genetic Algorithms and Case-based Reasoning Systems for Customer Classification," Expert Systems, Vol.23, No.3, pp.127-144, 2006. https://doi.org/10.1111/j.1468-0394.2006.00329.x
  6. H. Ahn, K.-j. Kim, and I. Han, "Global optimization of feature weights and the number of neighbors that combine in a CBR system," Expert Systems, Vol.23, No.5, pp.290-301, 2006. https://doi.org/10.1111/j.1468-0394.2006.00410.x
  7. H. Ahn, K.-j. Kim, and I. Han, "A Case-based Reasoning System with the Two-Dimensional Reduction Technique for Customer Classification," Expert Systems with Applications, Vol.32, No.4, pp.1011-1019, 2007. https://doi.org/10.1016/j.eswa.2006.02.021
  8. H. Ahn, J. J. Ahn, K. J. Oh, and D. H. Kim, "Facilitating Cross-selling in a Mobile Telecom Market to develop Customer Classification Model based on Hybrid Data Mining Techniques," Expert Systems with Applications, Vol.38, No.5, pp.5005-5012, 2011. https://doi.org/10.1016/j.eswa.2010.09.150
  9. C. Chiu, P. C. Chang, and N. H. Chiu, "A case-based expert support system for due-date assignment in a water fabrication factory," Journal of Intelligent Manufacturing, Vol.14, No.3-4, pp.287-296, 2003. https://doi.org/10.1023/A:1024693524603
  10. K.-j. Kim and I. Han, "Maintaining case-based reasoning systems using a genetic algorithms approach," Expert Systems with Applications, Vol.21, No.3, pp.139-145, 2001. https://doi.org/10.1016/S0957-4174(01)00035-5
  11. K. S. Shin and I. Han, "Case-based reasoning supported by genetic algorithms for corporate bond rating," Expert Systems with Applications, Vol.16, No.2, pp.85-95, 1999. https://doi.org/10.1016/S0957-4174(98)00063-3
  12. J. M. Garrell i Guiu, E. Golobardes i Rib?, E. Bernad? i Mansilla, and X. Llor? i F?brega, "Automatic diagnosis with genetic algorithms and case-based reasoning," Artificial Intelligence in Engineering, Vol.13, No.4, pp.367-372, 1999. https://doi.org/10.1016/S0954-1810(99)00009-6
  13. H. Y. Lee and K. Park, "Methods for Determining the Optimal Number of Cases to Combine in An Effective Case-Based Forecasting System," Korean Management Review, Vol.27, No.5, pp.1239-1252, 1999.
  14. Y. J. Park, "Case-Based Reasoning Methods based on Statistical Analysis," Doctoral Thesis, Division of Management Engineering, Korea Advanced Institute of Science and Technology, Seoul, Korea, 2006.
  15. H. Ahn, "Simultaneous optimization model of similarity threshold and coverage of the CBR system for target marketing," in Proceedings of 2007 KMIS Fall Conference, Seoul, Korea, pp.605-610, 2007.
  16. A. Aamodt and E. Plaza, "Case-based reasoning; Foundational issues, methodological variations, and system approaches," AI Communications, Vol.7, No.1, pp.39-59, 1994.
  17. W. Siedlecki and J. Sklanski, "A note on genetic algorithms for large-scale feature selection," Pattern Recognition Letters, Vol.10, No.5, pp.335-347, 1989. https://doi.org/10.1016/0167-8655(89)90037-8
  18. L. I. Kuncheva and L. C. Jain, "Nearest neighbor classifier: Simultaneous editing and feature selection," Pattern Recognition Letters, Vol.20, No.11-13, pp.1149-1156, 1999. https://doi.org/10.1016/S0167-8655(99)00082-3
  19. T. S. Kim, J. H. Yoon, and H. K. Lee, "Performance of a nonparametric multivariate nearest neighbor model in the prediction of stock index returns," Asia Pacific Management Review, Vol.7, No.1, pp.107-118, 2002.
  20. J. Sun and X.-F. Hui, "Financial Distress Prediction Based on Similarity Weighted Voting CBR," Lecture Notes in Artificial Intelligence, Vol.4093, pp.947-958, 2006.
  21. H. Ahn and K.-j. Kim, "Bankruptcy Prediction Modeling with Hybrid Case-Based Reasoning and Genetic Algorithms Approach," Applied Soft Computing, Vol.9, No.2, pp.599-607, 2009. https://doi.org/10.1016/j.asoc.2008.08.002
  22. F. E. H. Tay and L. J. Cao, "Application of support vector machines in financial time series forecasting," Omega, Vol.29, No.4, pp.309-317, 2001. https://doi.org/10.1016/S0305-0483(01)00026-3
  23. Bichindaritz and C. Marling, "Case-based reasoning in the health sciences: What's next?," Artificial Intelligence in Medicine, Vol.36, No.2, pp.127-135, 2006. https://doi.org/10.1016/j.artmed.2005.10.008
  24. S.-W. Kim and H. Ahn, "Development of an Intelligent Trading System Using Support Vector Machines and Genetic Algorithms," Journal of Intelligence and Information Systems, Vol.16, No.1, pp.71-92, 2010.