DOI QR코드

DOI QR Code

A Collaborative Filtering System Combined with Users' Review Mining : Application to the Recommendation of Smartphone Apps

사용자 리뷰 마이닝을 결합한 협업 필터링 시스템: 스마트폰 앱 추천에의 응용

  • 전병국 (국민대학교 비즈니스IT전문대학원) ;
  • 안현철 (국민대학교 비즈니스IT전문대학원)
  • Received : 2015.05.20
  • Accepted : 2015.06.16
  • Published : 2015.06.30

Abstract

Collaborative filtering(CF) algorithm has been popularly used for recommender systems in both academic and practical applications. A general CF system compares users based on how similar they are, and creates recommendation results with the items favored by other people with similar tastes. Thus, it is very important for CF to measure the similarities between users because the recommendation quality depends on it. In most cases, users' explicit numeric ratings of items(i.e. quantitative information) have only been used to calculate the similarities between users in CF. However, several studies indicated that qualitative information such as user's reviews on the items may contribute to measure these similarities more accurately. Considering that a lot of people are likely to share their honest opinion on the items they purchased recently due to the advent of the Web 2.0, user's reviews can be regarded as the informative source for identifying user's preference with accuracy. Under this background, this study proposes a new hybrid recommender system that combines with users' review mining. Our proposed system is based on conventional memory-based CF, but it is designed to use both user's numeric ratings and his/her text reviews on the items when calculating similarities between users. In specific, our system creates not only user-item rating matrix, but also user-item review term matrix. Then, it calculates rating similarity and review similarity from each matrix, and calculates the final user-to-user similarity based on these two similarities(i.e. rating and review similarities). As the methods for calculating review similarity between users, we proposed two alternatives - one is to use the frequency of the commonly used terms, and the other one is to use the sum of the importance weights of the commonly used terms in users' review. In the case of the importance weights of terms, we proposed the use of average TF-IDF(Term Frequency - Inverse Document Frequency) weights. To validate the applicability of the proposed system, we applied it to the implementation of a recommender system for smartphone applications (hereafter, app). At present, over a million apps are offered in each app stores operated by Google and Apple. Due to this information overload, users have difficulty in selecting proper apps that they really want. Furthermore, app store operators like Google and Apple have cumulated huge amount of users' reviews on apps until now. Thus, we chose smartphone app stores as the application domain of our system. In order to collect the experimental data set, we built and operated a Web-based data collection system for about two weeks. As a result, we could obtain 1,246 valid responses(ratings and reviews) from 78 users. The experimental system was implemented using Microsoft Visual Basic for Applications(VBA) and SAS Text Miner. And, to avoid distortion due to human intervention, we did not adopt any refining works by human during the user's review mining process. To examine the effectiveness of the proposed system, we compared its performance to the performance of conventional CF system. The performances of recommender systems were evaluated by using average MAE(mean absolute error). The experimental results showed that our proposed system(MAE = 0.7867 ~ 0.7881) slightly outperformed a conventional CF system(MAE = 0.7939). Also, they showed that the calculation of review similarity between users based on the TF-IDF weights(MAE = 0.7867) leaded to better recommendation accuracy than the calculation based on the frequency of the commonly used terms in reviews(MAE = 0.7881). The results from paired samples t-test presented that our proposed system with review similarity calculation using the frequency of the commonly used terms outperformed conventional CF system with 10% statistical significance level. Our study sheds a light on the application of users' review information for facilitating electronic commerce by recommending proper items to users.

협업 필터링은 학계나 산업계에서 우수한 성능으로 인해 많이 사용되는 추천기법이지만, 정량적 정보인 사용자들의 평가점수에만 국한하여 추천결과를 생성하므로 간혹 정확도가 떨어지는 문제가 발생한다. 이에 새로운 정보를 추가로 고려하여, 협업 필터링의 성능을 개선하려는 연구들이 지금까지 다양하게 시도되어 왔다. 본 연구는 최근 Web 2.0 시대의 도래로 인해 사용자들이 구입한 상품에 대한 솔직한 의견을 인터넷 상에 자유롭게 표현한다는 점에 착안하여, 사용자가 직접 작성한 리뷰를 참고하여 협업 필터링의 성능을 개선하는 새로운 추천 알고리즘을 제안하고, 이를 스마트폰 앱 추천 시스템에 적용하였다. 정성 정보인 사용자 리뷰를 정량화하기 위해 본 연구에서는 텍스트 마이닝을 활용하였다. 구체적으로 본 연구의 추천시스템은 사용자간 유사도를 산출할 때, 사용자 리뷰의 유사도를 추가로 반영하여 보다 정밀하게 사용자간 유사도를 산출할 수 있도록 하였다. 이 때, 사용자 리뷰의 유사도를 산출하는 접근법으로 중복 사용된 색인어의 빈도로 산출하는 방안과 TF-IDF(Term Frequency - Inverse Document Frequency) 가중치 합으로 산출하는 2가지 방안을 제시한 뒤 그 성능을 비교해 보았다. 실험결과, 제안 알고리즘을 통한 추천, 즉 사용자 리뷰의 유사도를 추가로 반영하는 알고리즘이 평점만을 고려하는 전통적인 협업 필터링과 비교해 더 우수한 예측정확도를 나타냄을 확인할 수 있었다. 아울러, 중복 사용 단어의 TF-IDF 가중치의 합을 고려했을 때, 단순히 중복 사용 단어의 빈도만을 고려했을 때 보다 조금 더 나은 예측정확도를 얻을 수 있음도 함께 확인할 수 있었다.

Keywords

References

  1. Breese, J. S., D. Heckerman, and C. Kadie, "Empirical analysis of predictive algorithms for collaborative filtering," Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, (1998), 43-52.
  2. Chen, P.-Y., S. Dhanasobhon, and M. D. Smith, "An Analysis of the Differential Impact of Reviews and Reviewers at Amazon.Com," Proceedings of International Conference on Information Systems(ICIS), (2007), 94.
  3. Choeh, J. Y., S. K. Lee, and Y. B. Cho, "Applying Rating Score's Reliability of Customers to Enhance Prediction Accuracy in Recommender System," Journal of Digital Contents Society, Vol. 13, No. 7(2013), 379-385.
  4. Cho, S. Y., H.-k. Kim, B. S. Kim, and H.-w. Kim, "Predicting Movie Revenue by Online Review Mining Using the Opening Week Online Review," Information Systems Review, Vol. 16, No. 1(2014), 113-134.
  5. Garcia-Cumbreras, M. A., A. Montejo-Raez, and M. C. Diaz-Galiano, "Pessimists and optimists: Improving collaborative filtering through sentiment analysis," Expert Systems with Applications, Vol. 40, No. 17(2013), 6758-6765. https://doi.org/10.1016/j.eswa.2013.06.049
  6. Hearst, M. A., "Untangling text data mining," Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, (1999), 3-10.
  7. Hyun, Y., H. Han, H. Choi, J. Park, K., Lee, K. -Y. Kwahk and N. Kim, "Methodology Using Text Analysis for Packaging R&D  Information services on Pending National Issues," Journal of Information Technology Applications & Management, Vol. 20, No. 3(2013), 231-257.
  8. Jacob, N., S. H. Weber, M. C. Muller, and I. Gurevych, "Beyond the Stars: Exploiting Free-Text User Reviews to Improve the Accuracy of Movie Recommendations," Proceedings of the 1st International CIKM Workshop on Topic-sentiment Analysis for Mass Opinion, Hong Kong, China, (2009), 57-64.
  9. Kim, K.-j. and H. Ahn, "User-Item Matrix Reduction Technique for Personalized Recommender Systems," Journal of Information Technology Applications & Management, Vol. 16, No. 1(2009), 97-113.
  10. Kim, K.-j. and H. Ahn, "Collaborative Filtering with a User-Item Matrix Reduction Technique for Recommender Systems," International Journal of Electronic Commerce, Vol. 16, No. 1(2011), 107-128. https://doi.org/10.2753/JEC1086-4415160104
  11. Kim, M. and K.-j. Kim, "Recommender Systems using Structural Hole and Collaborative Filtering," Journal of Intelligence and Information System, Vol. 20, No. 4(2014), 107-120. https://doi.org/10.13088/jiis.2014.20.4.107
  12. Leung, C. W.-k., S. C.-f. Chan, and F.-l. Chung, "Integrating Collaborative Filtering and Sentiment Analysis: A Rating Inference Approach," Proceedings of the ECAI 2006 Workshop on Recommender Systems, Riva del Garda, Italy, (2006), 62-66.
  13. Levi, A., Mokryn, O., Diot, C. and N. Taft, "Finding a Needle in a Haystack of Reviews: Cold Start Context-Based Hotel Recommender System," Proceedings of the Sixth ACM Conference on Recommender Systems, Dublin, Ireland, (2012), 115-122.
  14. Moshfeghi, Y., B. Piwowarski, and J. M. Jose, "Handling Data Sparsity in Collaborative Filtering Using Emotion and Semantic Based Features," Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Beijing, China, (2011), 625-634.
  15. Salton, G. and M. J. McGill, Introduction to Modern Information Retrieval, McGraw -Hill, 1986.
  16. Salton, G., A. Wong, and C. S. Yang, "A vector space model for automatic indexing," Communications of the ACM, Vol. 18, No. 11 (1975), 613-620. https://doi.org/10.1145/361219.361220
  17. Sarwar, B., Karypis, G., Konstan, J. and Riedl, J., "Item-based collaborative filtering recommendation algorithms," Proceedings of the 10th International Conference on World Wide Web, (2001), 285-295.
  18. Schafer, J. B., J. Konstan, and J. Riedl, "Electronic Commerce Recommender Applications," Journal of Data Mining and Knowledge Discovery, Vol. 5, Nos. 1-2(2001), 115-152. https://doi.org/10.1023/A:1009804230409
  19. Sebastiani, F., "Machine learning in automated text categorization," ACM Computing Surveys, Vol. 34, No. 1(2002), 1-47. https://doi.org/10.1145/505282.505283
  20. Shin, C. H., J. W.Lee, H. N. Yang, and I. Y. Choi, "The Research on Recommender for New Customers Using Collaborative Filtering and Social Network Analysis," Journal of Intelligence and Information Systems, Vol. 18, No.4(2012), 19-42. https://doi.org/10.13088/JIIS.2012.18.4.019
  21. Statista, Number of apps available in leading app stores as of July 2014, 2015. Available at http://www.statista.com/statistics/276623/number-of-apps-available-in-leading-app-stores/ (Downloaded 11 May, 2015).
  22. Wang, Y., Y. Liu, and X. Yu, "Collaborative Filtering with Aspect-Based Opinion Mining: A Tensor Factorization Approach," Proceedings of 2012 IEEE 12th International Conference on Data Mining (ICDM), Brussels, Belgium, (2012), 1152-1157.
  23. Witten, I. H., Text Mining: Practical Handbook of Internet Computing, CRC press, 2004.
  24. You, M., J.-S. Park, and J.-K. Kim, "Folder Recommendation Based on User Knowledge," Journal of Intelligence and Information System, Vol. 10, No. 3(2004), 133-146.
  25. Zhang, Z., D. Zhang, and J. Lai, "urCF: User Review Enhanced Collaborative Filtering," Proceedings of the 20th Americas Conference on Information Systems, (2014).
  26. Zhou, L. and P. Chaovalit, , "Ontology-Supported Polarity Mining," Journal of the American Society for Information Science and Technology, Vol. 59, No. 1(2008), 98-110. https://doi.org/10.1002/asi.20735

Cited by

  1. Recommender systems using cluster-indexing collaborative filtering and social data analytics vol.55, pp.17, 2017, https://doi.org/10.1080/00207543.2017.1287443
  2. 딥러닝을 이용한 온라인 리뷰 기반 다속성별 추천 모형 개발 vol.28, pp.1, 2019, https://doi.org/10.5859/kais.2019.28.1.97
  3. Sentiment Digitization Modeling for Recommendation System vol.12, pp.12, 2015, https://doi.org/10.3390/su12125191
  4. Perception and Appraisal of Urban Park Users Using Text Mining of Google Maps Review - Cases of Seoul Forest, Boramae Park, Olympic Park - vol.49, pp.4, 2021, https://doi.org/10.9715/kila.2021.49.4.015
  5. 구글맵리뷰 텍스트마이닝을 활용한 공원 이용자의 인식 및 평가 - 서울숲, 보라매공원, 올림픽공원을 대상으로 - vol.49, pp.4, 2021, https://doi.org/10.9715/kila.2021.49.4.15