DOI QR코드

DOI QR Code

Development of An Automatic Classification System for Game Reviews Based on Word Embedding and Vector Similarity

단어 임베딩 및 벡터 유사도 기반 게임 리뷰 자동 분류 시스템 개발

  • Yang, Yu-Jeong (Department of Computer Science, Sookmyung Women's University) ;
  • Lee, Bo-Hyun (Division of Computer Science, Sookmyung Women's University) ;
  • Kim, Jin-Sil (Division of Computer Science, Sookmyung Women's University) ;
  • Lee, Ki Yong (Division of Computer Science, Sookmyung Women's University)
  • Received : 2019.03.25
  • Accepted : 2019.05.13
  • Published : 2019.05.31

Abstract

Because of the characteristics of game software, it is important to quickly identify and reflect users' needs into game software after its launch. However, most sites such as the Google Play Store, where users can download games and post reviews, provide only very limited and ambiguous classification categories for game reviews. Therefore, in this paper, we develop an automatic classification system for game reviews that categorizes reviews into categories that are clearer and more useful for game providers. The developed system converts words in reviews into vectors using word2vec, which is a representative word embedding model, and classifies reviews into the most relevant categories by measuring the similarity between those vectors and each category. Especially, in order to choose the best similarity measure that directly affects the classification performance of the system, we have compared the performance of three representative similarity measures, the Euclidean similarity, cosine similarity, and the extended Jaccard similarity, in a real environment. Furthermore, to allow a review to be classified into multiple categories, we use a threshold-based multi-category classification method. Through experiments on real reviews collected from Google Play Store, we have confirmed that the system achieved up to 95% accuracy.

게임은 소프트웨어 특성상 출시 후 사용자들의 반응을 빠르게 파악하여 개선하는 것이 중요하다. 하지만 구글 플레이 앱 스토어 등 사용자들이 게임을 다운로드하고 리뷰를 올릴 수 있는 대부분의 사이트들은 게임 리뷰에 대한 매우 제한적이고 모호한 분류 기능만을 제공한다. 따라서 본 논문에서는 사용자들이 사이트에 올린 게임 리뷰를 보다 명확하고 운영에 유용한 주제들로 자동 분류하는 시스템을 개발한다. 본 논문에서 개발한 시스템은 리뷰에 포함된 단어들을 대표적인 단어 임베딩 모델인 word2vec을 사용하여 벡터들로 변환하고, 이 벡터들과 각 주제 간 유사도를 측정하여 해당 리뷰를 관련된 주제로 분류한다. 특히 분류 성능에 직접적인 영향을 미치는 벡터 간 유사도 측정 방법을 선택하기 위해 본 연구에서는 대표적인 벡터 간 유사도 측정 방법인 유클리디안 유사도, 코사인 유사도, 확장된 자카드 유사도의 성능을 실제 데이터를 사용하여 비교하였다. 또한 어떤 리뷰가 둘 이상의 주제에 해당하는 경우를 위해 임계값에 기반한 다중 분류 방법을 사용하였다. 구글 플레이 앱스토어의 실제 데이터를 사용한 실험 결과 본 시스템은 95%까지의 정확도를 보임을 확인하였다.

Keywords

References

  1. Chevalier, J. A. and Mayzlin, D., "The Effect of Word of Mouth on Sales: Online Book Reviews," Journal of Marketing Research, Vol. 43, No. 3, pp. 345-354, 2006. https://doi.org/10.1509/jmkr.43.3.345
  2. DMC Report, "2018 Mobile Game and Mobile Game Advertising Market Size and Status," https://www.dmcreport.co.kr/content/ReportView.php?type=Market&id=13368&gid=3.
  3. Duan, W., Gu, B., and Whinston, A. B., "The dynamics of online word-of-mouth and product sales-An empirical investigation of the movie industry," Journal of Retailing, Vol. 84, No. 2, pp. 233-242, 2008. https://doi.org/10.1016/j.jretai.2008.04.005
  4. Huang, A., "Similarity measures for text document clustering," Proceedings of the 6th New Zealand Computer Science Research Student Conference, pp. 49-56, 2008.
  5. Kim, J., Byeon, H., and Lee, S. H., "Enhancement of User Understanding and Service Value Using Online Reviews," The Journal of Information Systems, Vol. 20, No. 2, pp. 21-36, 2011. https://doi.org/10.5859/KAIS.2011.20.2.21
  6. Korea Creative Content Agency, "2018 Korea Game White Paper," http://www.kocca.kr/cop/bbs/view/B0000146/1837580.do.
  7. Kostyra, D. S., Reiner, J., Natter, M., and Klapper, D., "Decomposing the Effects of Online Customer Reviews on Brand, Price, and Product Attributes," International Journal of Research in Marketing, Vol. 33, No. 1, pp. 11-26, 2015. https://doi.org/10.1016/j.ijresmar.2014.12.004
  8. Lee, D. H. and Kim, K. H., "Web Site Keyword Selection Method by Considering Semantic Similarity Based on Word2Vec," The Journal of Information Systems, Vol. 23, No. 2, pp. 83-96, 2018.
  9. Lilleberg, J., Zhu, Y., and Zhang, Y., "Support vector machines and Word2vec for text classification with semantic features," IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC), 2015.
  10. Mikolov, T., Chen, K., Corrado, G., and Dean, J., "Efficient Estimation of Word Representations in Vector Space," ICLR Workshop Paper, 2013.
  11. Mikolov, T., Yih, W., and Zweig, G., "Linguistic Regularities in Continuous Space Word Representations," Proceedings of NAACL-HLT, 2013.
  12. Setty, V., Kreitz, G., Vitenberg, R., van Steen, M., Urdaneta, G., and Gimaker, S., "The hidden pub/sub of Spotify," Proceedings of the 7th ACM International Conference on Distributed Eventbased Systems, pp. 231-240, 2013.
  13. Sudeep Das, "Making meaningful restaurant recommendations at opentable," https://de.slideshare.net/SudeepDasPhD/recsys-2015-making-meaningfulrestaurant-recommendations-at-opentable, 2015.
  14. Tanimoto, T. T., "An elementary mathematical theory of classification and prediction," IBM Report (November, 1958), cited in: G. Salton, Automatic Information Organization and Retrieval, p. 238, 1968.
  15. Wensen, L., Zewen, C., Jun, W., and Xiaoyi, W., "Short text classification based on Wikipedia and Word2vec," 2016 2nd IEEE International Conference on Computer and Communications (ICCC), 2016.
  16. Yeon, J. H., Lee, D. J., Shim, J. H., and Lee, S. G., "Product Review Data and Sentiment Analytical Processing Modeling," The Journal of Society for e-Business Studies, Vol. 16, No. 4, pp. 125-137, 2011. https://doi.org/10.7838/jsebs.2011.16.4.125
  17. Zhang, D., Xu, H., Su, Z., and Xu, Y., "Chinese comments sentiment classification based on word2vec and SVMperf," Expert System with Applications, Vol. 42, No. 4, pp. 1857-1863, 2015. https://doi.org/10.1016/j.eswa.2014.09.011
  18. Zhu, F. and Zhang, X. M., "Impact of online consumer reviews on sales: The moderating role of product and consumer characteristics," Journal of Marketing, Vol. 74, No. 2, pp. 133-148, 2010. https://doi.org/10.1509/jm.74.2.133