Performance Analysis of Similarity Reflecting Jaccard Index for Solving Data Sparsity in Collaborative Filtering

협력필터링의 데이터 희소성 해결을 위한 자카드 지수 반영의 유사도 성능 분석

  • Received : 2016.06.02
  • Accepted : 2016.07.18
  • Published : 2016.07.30

Abstract

It has been studied to reflect the number of co-rated items for solving data sparsity problem in collaborative filtering systems. A well-known method of Jaccard index allowed performance improvement, when combined with previous similarity measures. However, the degree of performance improvement when combined with existing similarity measures in various data environments are seldom analyzed, which is the objective of this study. Jaccard index as a sole similarity measure yielded much higher prediction quality than traditional measures and very high recommendation quality in a sparse dataset. In general, previous similarity measures combined with Jaccard index improved performance regardless of dataset characteristics. Especially, cosine similarity achieved the highest improvement in sparse datasets, while similarity of Mean Squared Difference degraded prediction quality in denser sets. Therefore, one needs to consider characteristics of data environment and similarity measures before combining Jaccard index for similarity use.

협력 필터링 시스템에서 데이터 희소성 문제의 해결을 위해 공통평가항목수를 반영하는 방법이 연구되었다. 이러한 방법으로 널리 알려진 자카드 지수는 기존의 유사도 척도와 결합되어 성능을 개선할 수 있었다. 그러나, 다양한 데이터 환경에서 여러 유사도 척도들과 각각 결합했을 때의 성능 개선 효과에 대한 분석 연구는 미미하므로, 본 연구는 이에 대한 분석을 목적으로 한다. 우선 자카드 지수 자체를 유사도 척도로 사용했을때 희소한 데이터셋 상에서 전통적인 척도들보다 월등한 예측 성능을 보였고 추천 성능도 매우 우수하였다. 자카드 지수를 결합함으로써 기존 유사도 척도는 데이터 특성에 상관없이 성능이 대개 향상되었고, 특히 코사인 유사도는 희소한 데이터셋에서 가장 큰 향상을 이루었으나, 평균차이 제곱(Mean Squared Difference)의 유사도는 밀집된 데이터셋에서 오히려 저하된 예측 성능을 보였다. 따라서, 자카드 지수를 결합하여 사용하기 위해 데이터 환경 특성과 유사도 척도를 고려할 필요가 있다.

Keywords

References

  1. Su, X., & Khoshgoftaar, T. M. (2009). A survey of collaborative filtering techniques. Advances in Artificial Intelligence, 2009, 4.
  2. Aamir, M. & Bhusry, M. (2015). Recommendation system: state of the art approach. International Journal of Computer Applications, 120(12), 25-32. https://doi.org/10.5120/21281-4200
  3. Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge & Data Engineering, 17(6), 734-749. https://doi.org/10.1109/TKDE.2005.99
  4. Lee, S. (2015). A strategy for neighborhood selection in collaborative filtering-based recommender systems. Journal of KIISE, 42(11), 1380-138. https://doi.org/10.5626/JOK.2015.42.11.1380
  5. Resnick, P. et al. (1994). GroupLens: an open architecture for collaborative filtering of Netnews. Proc. of the ACM Conf. Computer Supported Cooperative Work, 175-186.
  6. Ahn, H. (2008). A new similarity measure for collaborative filtering to alleviate the new user cold-starting problem. Information Sciences, 178(1), 37-51. https://doi.org/10.1016/j.ins.2007.07.024
  7. Chen, C. C., Wan, Y.-H., Chung, M.-C., & Sun, Y.-C. (2013). An effective recommendation method for cold start new users using trust and distrust networks. Information Sciences, 224, 19-36. https://doi.org/10.1016/j.ins.2012.10.037
  8. Bobadilla, J., Ortega, F., Hernando, A., & Bernal, J. (2012). A collaborative filtering approach to mitigate the new user cold start problem. Knowledge-Based Systems, 26, 225-238.
  9. Liu, H., Hu, Z., Mian, A., Tian, H., & Zhu, X. (2014). A new user similarity model to improve the accuracy of collaborative filtering. Knowledge-Based Systems, 56, 156-166.
  10. Jamali, M., & Ester, M. (2009). TrustWalker: a random walk model for combining trust-based and item-based recommendation. Prococeedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 397-406.
  11. Bobadilla, J., Serradilla, F., & Bernal. J. (2010). A new collaborative filtering metric that improves the behavior of recommender systems. Knowledge-Based Systems 23, 520-528. https://doi.org/10.1016/j.knosys.2010.03.009
  12. Sanchez, J. L., Serradilla, F., Martinez, E., & Bobadilla, J. (2008). Choice of metrics used in collaborative filtering and their impact on recommender systems. Proceedings of the IEEE International Conference on Digital Ecosystems and Technologies, 432-436.
  13. Koutrica, G., Bercovitz, B., & Garcia, H. (2009). FlexRecs: expresing and combining flexible recommendations. Proc. of the ACM SIGMOD Int'l Conf. on Management of data, 745-758.
  14. Gao, M., Wu, Z., & Jiang, F. (2011). Userrank for item-based collaborative filtering recommendation. Information Processing Letters, 111(9), 440-446. https://doi.org/10.1016/j.ipl.2011.02.003