DOI QR코드

DOI QR Code

Improving Performance of Recommendation Systems Using Topic Modeling

사용자 관심 이슈 분석을 통한 추천시스템 성능 향상 방안

  • Choi, Seongi (Graduate School of Business IT, Kookmin University) ;
  • Hyun, Yoonjin (Graduate School of Business IT, Kookmin University) ;
  • Kim, Namgyu (School of Management Information Systems, Kookmin University)
  • 최성이 (국민대학교 비즈니스IT전문대학원) ;
  • 현윤진 (국민대학교 비즈니스IT전문대학원) ;
  • 김남규 (국민대학교 경영대학 경영정보학부)
  • Received : 2015.08.31
  • Accepted : 2015.09.09
  • Published : 2015.09.30

Abstract

Recently, due to the development of smart devices and social media, vast amounts of information with the various forms were accumulated. Particularly, considerable research efforts are being directed towards analyzing unstructured big data to resolve various social problems. Accordingly, focus of data-driven decision-making is being moved from structured data analysis to unstructured one. Also, in the field of recommendation system, which is the typical area of data-driven decision-making, the need of using unstructured data has been steadily increased to improve system performance. Approaches to improve the performance of recommendation systems can be found in two aspects- improving algorithms and acquiring useful data with high quality. Traditionally, most efforts to improve the performance of recommendation system were made by the former approach, while the latter approach has not attracted much attention relatively. In this sense, efforts to utilize unstructured data from variable sources are very timely and necessary. Particularly, as the interests of users are directly connected with their needs, identifying the interests of the user through unstructured big data analysis can be a crew for improving performance of recommendation systems. In this sense, this study proposes the methodology of improving recommendation system by measuring interests of the user. Specially, this study proposes the method to quantify interests of the user by analyzing user's internet usage patterns, and to predict user's repurchase based upon the discovered preferences. There are two important modules in this study. The first module predicts repurchase probability of each category through analyzing users' purchase history. We include the first module to our research scope for comparing the accuracy of traditional purchase-based prediction model to our new model presented in the second module. This procedure extracts purchase history of users. The core part of our methodology is in the second module. This module extracts users' interests by analyzing news articles the users have read. The second module constructs a correspondence matrix between topics and news articles by performing topic modeling on real world news articles. And then, the module analyzes users' news access patterns and then constructs a correspondence matrix between articles and users. After that, by merging the results of the previous processes in the second module, we can obtain a correspondence matrix between users and topics. This matrix describes users' interests in a structured manner. Finally, by using the matrix, the second module builds a model for predicting repurchase probability of each category. In this paper, we also provide experimental results of our performance evaluation. The outline of data used our experiments is as follows. We acquired web transaction data of 5,000 panels from a company that is specialized to analyzing ranks of internet sites. At first we extracted 15,000 URLs of news articles published from July 2012 to June 2013 from the original data and we crawled main contents of the news articles. After that we selected 2,615 users who have read at least one of the extracted news articles. Among the 2,615 users, we discovered that the number of target users who purchase at least one items from our target shopping mall 'G' is 359. In the experiments, we analyzed purchase history and news access records of the 359 internet users. From the performance evaluation, we found that our prediction model using both users' interests and purchase history outperforms a prediction model using only users' purchase history from a view point of misclassification ratio. In detail, our model outperformed the traditional one in appliance, beauty, computer, culture, digital, fashion, and sports categories when artificial neural network based models were used. Similarly, our model outperformed the traditional one in beauty, computer, digital, fashion, food, and furniture categories when decision tree based models were used although the improvement is very small.

많은 기관들이 데이터에 기반을 둔 의사결정을 수행해 왔으며, 특히 수치자료를 비롯한 정형 데이터가 이러한 목적으로 널리 활용되어 왔다. 하지만 최근에는 스마트기기와 소셜미디어의 발달로 인해 다양한 형태를 가진 방대한 양의 정보가 생성, 공유, 저장되면서, 전통적인 정형 데이터 기반 의사결정으로부터 비정형 빅데이터 기반 의사결정으로 관심의 전환이 이루어지고 있다. 데이터 기반 의사결정의 대표적 분야인 추천시스템 분야에서도 성능 향상을 위해 비정형 데이터를 활용해야 한다는 필요성이 최근 꾸준히 제기되고 있다. 특히 사용자의 성향이나 선호도는 고객의 니즈와 직결되기 때문에, 비정형 데이터 분석을 통해 사용자의 성향을 파악하고 이를 통해 상품 추천 및 구매 예측의 정확도를 향상시키기 위한 노력이 매우 시급하게 이루어질 필요가 있다. 따라서 본 연구에서는 사용자의 성향을 측정하여 재구매 예측 정확도, 특히 카테고리별 재구매 예측 정확도를 높임으로써, 궁극적으로 추천시스템의 성능을 향상시킬 수 있는 방안을 제시한다. 구체적으로는 사용자의 일상적인 인터넷 사용 기록을 분석하여 고객이 조회하는 뉴스 기사의 이슈를 식별하고 다양한 이슈에 대한 고객의 관심을 계량화한 후, 이를 활용하여 고객의 카테고리별 재구매 여부를 예측하는 모델을 제안하고자 한다. 실제 웹 트랜잭션으로부터 도출된 인터넷 뉴스 조회 기록 및 쇼핑몰 구매 기록을 대상으로 실험을 수행한 결과, 고객의 과거 구매이력만을 활용한 카테고리 재구매 예측 모형에 비해 본 연구에서 제안한 모형, 즉 고객의 과거 구매이력과 관심 이슈를 모두 활용한 예측 모형의 정확도가 다소 우수한 것으로 나타났다.

Keywords

References

  1. Aciar, S., D. Zhang, S. Simoff, and J. Debenham, "Recommender System Based on Consumer Product Reviews," WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence table of contents, (2006), 719-723.
  2. Ahn, H., "Improvement of a Context-aware Recommender System through User's Emotional State Prediction," Journal of Information Technology Applications & Management, Vol. 21, No.4(2014), 203-223.
  3. Ahn, S. M., I. H. Kim, B. Choi, Y. Cho, E. Kim, and M. K. Kim, "Understanding the Performance of Collaborative Filtering Recommendation through Social Network Analysis," Journal of Society for e-Business Studies, Vol.17, No.2 (2014), 129-147. https://doi.org/10.7838/JSEBS.2012.17.2.129
  4. Armentano, M. G., D. Godoy, and A. A. Amandi, "Followee Recommendation Based on Text Analysis of Micro-blogging Activity," Information Systems, Vol.38, No.8(2013), 1116-1127. https://doi.org/10.1016/j.is.2013.05.009
  5. Balabanovic, M. and Y. Shoham, "Fab: Content-Based, Collaborative Recommendation," Communication of the ACM, Vol.40, No.3 (1997), 66-72. https://doi.org/10.1145/245108.245124
  6. Billsus, D. and M. J. Pazzani, "Learning Collaborative Information Filters," Proceedings of 15th International Conference on Machine Learning, (1998), 46-45.
  7. Brynjolfsson, E., L. M. Hitt, and H. H. Kim, "Strength in Numbers: How Does Data-Driven Decisionmaking Affect Firm Performance?," Social Science Research Network, (2011), 33-34.
  8. Choi, H. G. and I. J. Hwang, "Emotion-based Music Recommendation System based on Twitter Document Analysis," Journal of KIIS: Computing Pratice and Letters, Vol.18, No.11(2012), 762-767.
  9. Choi, S. and N. Kim, "Identifying the Interests of Web Category Visitors Using Topic Analysis," Journal of Information Technology Applications & Management, Vol.21, No.4(2014), 415-429.
  10. Chun, I. G. and I. S. Hong, "The Implementation of Knowledge-based Recommender System for Electronic Commerce Using Java Expert System Library," Proceedings of IEEE International Symposium on Industrial Electronics, (2001), 1766-1770.
  11. Funakoshi, K. and T. Ohguro, "A Content-Based Collaborative Recommender System with Detailed Use of Evaluations," Proceedings of the 4th International Conference on Knowledge-Based Intelligent Engineering Systems and Allied Technologies, (2000), 253-256.
  12. Heo, J., P. Ryu, Y. Choi, H. Kim, and C. Ock, "An Issue Event Search System based on Big Data for Decision Supporting: Social Wisdom," Jounal of KIISE : Software and Applications, Vol.40, No.7(2013), 381-394.
  13. Hyun, Y., N. Kim, and Y. Cho, "A Multi-Dimensional Issue Clustering from the Perspective Consumers Interests and R&D," Journal of Information Technology Services, Vol.14, No.1(2015), 237-249.
  14. Hyung, Z., K. Lee, and K. Lee, "Music Recommendation Using Text Analysis on Song Requests to Radio Stations," Expert Systems with Applications, Vol.41, No.5 (2014), 2608-2618. https://doi.org/10.1016/j.eswa.2013.10.035
  15. Jeong, I.-Y., X. Yang, and H.-k. Jung, "A Study on Movies Recommendation System of Hybrid Filtering-Based," Journal of Korea Institute of Informationand Communication Engineering, Vol.19, No.1(2015), 113-118. https://doi.org/10.6109/jkiice.2015.19.1.113
  16. Kang, M. M., S. R. Kim, and S. M. Park, "Analysis and utilization of Big Data," Korea Information Science Society review, Vol30, No.6(2012), 25-32.
  17. Kim, J., N. Kim, and Y. Cho, "User-Perspective Issue Clustering Using Multi-Layered Two-Mode Network Analysis," Journal of Intelligent Information Systems, Vol.20, No.2(2014), 93-107.
  18. Kim, J. K., D. H. Ahn, and Y. H. Cho, "A Personalized Recommender System, WebCFPT: A Collaborative Filtering using Web Mining and Product Taxonomy," Asia Pacific Journal of Information Systems, Vol.15, No.1 (2005), 63-79.
  19. Kim, K. J. and B. G. Kim, "Product Recommender System for Online Shopping Malls using Data Mining Techniques," Journal of Intelligence and Information Systems, Vol.11, No.1(2005), 191-205.
  20. Kim, S. and A. H. Oh, "Offline Book Recommendation System using User Location Information," HCI, (2012), 53-55.
  21. Kim, Y., N. Kim, and S. R. Jeong, "Stock-Index Invest Model Using News Big Data Opinion Mining," Journal of Intelligence and Information Systems, Vol.18, No.2(2012), 143-156. https://doi.org/10.13088/JIIS.2012.18.2.143
  22. Min, G. Y. and D. H. Jeong, "Research on Assessment of Impact of Big Data Attributes to Disaster Response Decision-Making Process," Journal of Society for e-Business Studies, Vol.18, No.3(2013). 17-43. https://doi.org/10.7838/jsebs.2013.18.3.017
  23. Roh, J., K. Yoon, J. Kim, and J.-h. Lee, "A Music Recommendation System Using Collaborative Filtering and Context Awareness," proceeding of KIIS Fall conference, Vol.18, No.2(2008), 76-79.
  24. Yu, E., Y. Kim, N. Kim, and S. R. Jeong, "Predicting the Direction of the Stock Index by Using a Domain-Specific Sentiment Dictionary," Journal of Intelligent Information Systems, Vol.19, No.1(2013), 95-110. https://doi.org/10.13088/jiis.2013.19.1.095

Cited by

  1. 연관상품 추천을 위한 회귀분석모형 기반 연관 규칙 척도 결합기법 vol.23, pp.1, 2015, https://doi.org/10.13088/jiis.2017.23.1.127
  2. 사회연결망 분석을 활용한 연관규칙 확장기법 vol.23, pp.4, 2015, https://doi.org/10.13088/jiis.2017.23.4.111
  3. 빅 데이터를 이용한 재해 정보 지원에 관한 연구 vol.9, pp.8, 2018, https://doi.org/10.15207/jkcs.2018.9.8.025
  4. 영화 콘텐츠 큐레이션과 메타데이터 표준 연구의 동향 분석 -예술경영 관점으로- vol.11, pp.6, 2020, https://doi.org/10.15207/jkcs.2020.11.6.163
  5. Improvement of a Product Recommendation Model using Customers' Search Patterns and Product Details vol.26, pp.1, 2021, https://doi.org/10.9708/jksci.2021.26.01.265
  6. 네트워크 중심성 척도가 추천 성능에 미치는 영향에 대한 연구 vol.27, pp.1, 2021, https://doi.org/10.13088/jiis.2021.27.1.023