DOI QR코드

DOI QR Code

A Study on Analyzing Sentiments on Movie Reviews by Multi-Level Sentiment Classifier

영화 리뷰 감성분석을 위한 텍스트 마이닝 기반 감성 분류기 구축

  • Kim, Yuyoung (Graduate School of Library and Information Science, Yonsei University) ;
  • Song, Min (Department of Library and Information Science, Yonsei University)
  • 김유영 (연세대학교 문헌정보학과) ;
  • 송민 (연세대학교 문헌정보학과)
  • Received : 2016.07.28
  • Accepted : 2016.09.07
  • Published : 2016.09.30

Abstract

Sentiment analysis is used for identifying emotions or sentiments embedded in the user generated data such as customer reviews from blogs, social network services, and so on. Various research fields such as computer science and business management can take advantage of this feature to analyze customer-generated opinions. In previous studies, the star rating of a review is regarded as the same as sentiment embedded in the text. However, it does not always correspond to the sentiment polarity. Due to this supposition, previous studies have some limitations in their accuracy. To solve this issue, the present study uses a supervised sentiment classification model to measure a more accurate sentiment polarity. This study aims to propose an advanced sentiment classifier and to discover the correlation between movie reviews and box-office success. The advanced sentiment classifier is based on two supervised machine learning techniques, the Support Vector Machines (SVM) and Feedforward Neural Network (FNN). The sentiment scores of the movie reviews are measured by the sentiment classifier and are analyzed by statistical correlations between movie reviews and box-office success. Movie reviews are collected along with a star-rate. The dataset used in this study consists of 1,258,538 reviews from 175 films gathered from Naver Movie website (movie.naver.com). The results show that the proposed sentiment classifier outperforms Naive Bayes (NB) classifier as its accuracy is about 6% higher than NB. Furthermore, the results indicate that there are positive correlations between the star-rate and the number of audiences, which can be regarded as the box-office success of a movie. The study also shows that there is the mild, positive correlation between the sentiment scores estimated by the classifier and the number of audiences. To verify the applicability of the sentiment scores, an independent sample t-test was conducted. For this, the movies were divided into two groups using the average of sentiment scores. The two groups are significantly different in terms of the star-rated scores.

누구나 본인이 사용한 제품이나, 이용한 서비스에 대한 후기를 자유롭게 인터넷에 작성할 수 있고, 이러한 데이터의 양은 점점 더 많아지고 있다. 감성분석은 사용자가 생성한 온라인 텍스트 속에 내포된 감성 및 감정을 식별하기 위해 사용된다. 본 연구는 다양한 데이터 도메인 중 영화 리뷰를 분석 대상으로 한다. 영화 리뷰를 이용한 기존 연구에서는 종종 리뷰 평점을 관객의 감성으로 동일시하여 감성분석에 이용한다. 그러나 리뷰 내용과 평점의 실제적 극성 정도가 항상 일치하는 것은 아니기 때문에 연구의 정확성에 한계가 발생할 수 있다. 이에 본 연구에서는 기계학습 기반의 감성 분류기를 구축하고, 이를 통해 리뷰의 감성점수를 산출하여 리뷰에서 나타나는 감성의 수치화를 목표로 한다. 나아가 산출된 감성점수를 이용하여 리뷰와 영화 흥행 간의 연관성을 살펴보았다. 감성분석 모델은 지지벡터 분류기와 신경망을 이용해 구축되었고, 총 1만 건의 영화 리뷰를 학습용 데이터로 하였다. 감성분석은 총 175편의 영화에 대한 1,258,538개의 리뷰에 적용하였다. 리뷰의 평점과 흥행, 그리고 감성점수와 흥행과의 연관성은 상관분석을 통해 살펴보았고, t-검정으로 두 지표의 평균차를 비교하여 감성점수의 활용성을 검증하였다. 연구 결과, 본 연구에서 제시하는 모델 구축 방법은 나이브 베이즈 분류기로 구축한 모델보다 높은 정확성을 보였다. 상관분석 결과로는, 영화의 주간 평균 평점과 관객 수 간의 유의미한 양의 상관관계가 나타났고, 감성점수와 관객 수 간의 상관분석에서도 유사한 결과가 도출되었다. 이에 두 지표간의 평균을 이용한 t-검정을 수행하고, 이를 바탕으로 산출한 감성점수를 리뷰 평점의 역할을 할 수 있는 지표로써 활용 가능함을 검증하였다. 나아가 검증된 결론을 근거로, 트위터에서 영화를 언급한 트윗을 수집하여 감성분석을 적용한 결과를 살펴봄으로써 감성분석 모델의 활용 방안을 모색하였다. 전체적 실험 및 검증의 과정을 통해 본 연구는 감성분석 연구에 있어 개선된 감성 분류 방법을 제시할 수 있음을 보였고, 이러한 점에서 연구의 의의가 있다.

Keywords

References

  1. Airoldi, E., X. Bai and R. Padman, "Markov blankets and meta-heuristics search: sentiment extraction from unstructured texts," Proceedings of International Workshop on Knowledge Discovery on the Web, (2004), 167-187.
  2. Amplayo, R. K. and J. Occidental, "Multi-level classifier for the detection of insults in social media," Proceedings of 15th Philippine Computing Science Congress, (2015).
  3. Annett, M. and G. Kondrak, "A comparison of sentiment analysis techniques: Polarizing movie blogs," Advances in Artificial Intelligence, (2008), 25-35.
  4. Appel, O., F. Chiclana and J. Carter, "Main concepts, state of the art and future research questions in sentiment analysis," Acta Polytechnica Hungarica, Vol.12, No.3(2015), 87-108.
  5. Asur, S. and B. A. Huberman, "Predicting the future with social media. In Web Intelligence and Intelligent Agent Technology (WI-IAT)," Proceedings of 2010 IEEE/WIC/ACM International Conference, (2010), 492-299.
  6. Chen, Y. and J. Xie, 'Online consumer review: Word-of-mouth as a new element of marketing communication mix," Management Science, Vol.54, No.3(2008), 477-491. https://doi.org/10.1287/mnsc.1070.0810
  7. Chen, H. and D. Zimbra, "AI and opinion mining," IEEE Intelligent Systems, Vol.25, No.3(2010), 74-80. https://doi.org/10.1109/MIS.2010.75
  8. Chevalier, J. A. and D. Mayzlin, "The effect of word of mouth on sales: Online book reviews," Journal of Marketing Research, Vol.43, No.3(2006), 345-354. https://doi.org/10.1509/jmkr.43.3.345
  9. Cui, G., H. K. Lui and X. Guo, "The effect of online consumer reviews on new product sales," International Journal of Electronic Commerce, Vol.17, No.1(2012), 39-58. https://doi.org/10.2753/JEC1086-4415170102
  10. Dellarocas, C., N. Awad and M. Zhang, "Using online ratings as a proxy of word-of-mouth in motion picture revenue forecasting," Smith School of Business, University of Maryland, 2005.
  11. Duan, W. and A. B. Whinston, "The dynamics of online word-of-mouth and product sales-An empirical investigation of the movie industry," Journal of Retailing, Vol.84, No.2(2008), 233-242. https://doi.org/10.1016/j.jretai.2008.04.005
  12. Ferguson, P., N. O'Hare, M. Davy, A. Bermingham, P. Sheridan, C. Gurrin and A. F. Smeaton, "Exploring the use of paragraph-level annotations for sentiment analysis of financial blogs," Proceedings of WOMAS 2009-Workshop on Opinion Mining and Sentiment Analysis, (2009).
  13. Ghose, Al, and P. G. Ipeirotis, "Estimating the helpfulness and economic impact of product reviews: Mining text and reviewer characteristics," Knowledge and Data Engineering, IEEE Transactions on, Vol.23, No.10(2011), 1498-1512. https://doi.org/10.1109/TKDE.2010.188
  14. Glorot, X., A. Bordes and Y. Bengio, "Domain adaptation for large-scale sentiment classification: A deep learning approach," Proceedings of the 28th International Conference on Machine Learning, (2011), 513-520.
  15. Heo, M. H., P. S. Kang and S. Cho, "Predicting Box-office with Opinion mining reviews," Proceedings of the Korean Operations and Management Science Society Conference, (2013), 487-500.
  16. Hu, Z., W. Ding and X. Zheng, "Review sentiment analysis based on deep learning," Proceedings of e-Business Engineering (ICEBE) 2015 IEEE 12th International Conference, (2015), 87-94.
  17. Jakob, N., S. H. Weber, M. C. Muller and I. Gurevych, "Beyond the stars: exploiting free-text user reviews to improve the accuracy of movie recommendations," Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion, (2009), 57-64.
  18. Jo, S. Y., H.-K. Kim, B. Kim and H.-W. Kim, "Predicting Movie Revenue by Online Review Mining : Using the Opening Week Online Review," Information Systems Review, Vol.16, No.3(2014), 113-134.
  19. Jung, Y., Research in Information Retrieval, revised edition, Yonsei University Press, 2012.
  20. Kennedy, A. and D. Inkpen, "Sentiment classification of movie reviews using contextual valence shifters," Computational intelligence, Vol.22, No.2(2006), 110-125. https://doi.org/10.1111/j.1467-8640.2006.00277.x
  21. Kim, M. H., S. E. Kim and Y. J. Choi, "The Determinants of Box-office Performance of Korean Films and Implications for Policies," Film Studies, No.46(2010), 31-56.
  22. Kim, Y. H. and J. H. Hong, "A Study for the Development of Motion Picture Box-office Prediction Model," Communications for Statistical Applications and Methods, Vol.18, No.6(2011), 859-869. https://doi.org/10.5351/CKSS.2011.18.6.859
  23. Kim, S. H. and J. M. Han, "An Analysis of Motion Picture Box Office Performance : Focusing on Korean Movies Released in 2012," Social Science Studies, Vol.53, No.1 (2014), 191-214.
  24. Konig, A. C. and E. Brill, "Reducing the human overhead in text categorization," Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2006), 598-603.
  25. Krauss, J., S. Nann, D. Simon, P. A. Gloor and K. Fischbach, "Predicting Movie Success and Academy Awards through Sentiment and Social Network Analysis," In ECIS, (2008), 2026-2037.
  26. Lee, K. J. and W. J. Jang, "Predicting Financial Success of a Movie Using Bayesian Choice Model," Proceedings of the Korean Operations and Management Science Society Conference, (2006), 1428-1433.
  27. Liu, B., "Sentiment analysis and opinion mining," Synthesis Lectures on Human Language Technologies, Vol.5, No.1(2012), 1-167. https://doi.org/10.2200/S00416ED1V01Y201204HLT016
  28. Melville, P., W. Gryc and R. D. Lawrence, "Sentiment analysis of blogs by combining lexical knowledge with text classification," Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, (2009), 1275-1284.
  29. Mullen, T. and N. Collier, "Sentiment analysis using support vector machines with diverse information sources," Proceedings of Empirical Methods in Natural Language Processing, (2004), 412-418.
  30. Neelamegham, R. and P. Chintagunta, "A Bayesian model to forecast new product performance in domestic and international markets," Marketing Science, Vol.18, No.2(1999), 115-136. https://doi.org/10.1287/mksc.18.2.115
  31. Oh, Y.-J. and S.-H. Chae, "Movie Rating Inference by Construction of Movie Sentiment Sentence using Movie comments and ratings," Journal of Internet Computing and Services, Vol.16, No.2(2015), 41-28. https://doi.org/10.7472/jksii.2015.16.2.41
  32. Pagano, D. and W. Maalej, "User feedback in the appstore: An empirical study," Proceedings of Requirements Engineering Conference, 2013 21st IEEE International, (2013), 125-134.
  33. Pak, A. and P. Paroubek, "Twitter as a corpus for sentiment analysis and opinion mining," Proceedings of the Seventh International Conference on Language Resources and Evaluation, (2010), 1320-1326.
  34. Pang, B., L. Lee and S. Vaithyanathan, "Thumbs up?: sentiment classification using machine learning techniques," Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, (2002), 79-86.
  35. Pang, B. and L. Lee, "Opinion mining and sentiment analysis," Foundations and Trends in Information Retrieval, Vol.2, No.1-2 (2008), 1-135. https://doi.org/10.1561/1500000011
  36. Park, S. H. and H.-J. Song, "Word of Mouth and Box Office Performance: WOM's Impact on Weekly Box Office Revenues," Korean Journal of Journalism and Communication Studies, Vol.56, No.4(2012), 210-235.
  37. Park, S. H., H.-J. Song and W.-K. Jung, "The Determinants of Motion Picture Box Office Performance : Evidence from Korean Movies Released in 2009-2010," Journal of Communication Science, Vol.11, No.4(2011), 231-258.
  38. Sarvabhotla, K., P. Pingali and V. Varma, "Supervised learning approaches for rating customer reviews," Journal of Intelligent Systems, Vol.19, No.1(2010), 79-94.
  39. Seroussi, Y., F. Bohnert and I. Zukerman, "Personalised rating prediction for new users using latent factor models," Proceedings of the 22nd ACM conference on Hypertext and hypermedia, (2011), 47-56.
  40. Shin, J. and H. Kim, "A Robust Pattern-based Feature Extraction Method for Sentiment Categorization of Korean Customer Reviews," Journal of KIISE : Software and Applications, Vol.37, No.12(2010), 946-950.
  41. Wyatt, J., "High concept, product differentiation, and the contemporary US film industry," Current Research in Film: Audiences, Economics and Law, Vol.5(1991), 86-105.
  42. Yanagimoto, H., M. Shimada and A. Yoshimura, "Document similarity estimation for sentiment analysis using neural network". Proceedings of Computer and Information Science (ICIS), 2013 IEEE/ACIS 12th International Conference, (2013), 105-110.

Cited by

  1. 온라인 상품평의 내용적 특성이 소비자의 인지된 유용성에 미치는 영향 vol.23, pp.3, 2017, https://doi.org/10.13088/jiis.2017.23.3.029
  2. 감성분석 기반의 게임 소비자 온라인 구전효과 연구 vol.16, pp.3, 2016, https://doi.org/10.14400/jdc.2018.16.3.145
  3. 빅데이터를 활용한 영화 흥행에 따른 리뷰길이 변화 vol.18, pp.5, 2016, https://doi.org/10.5392/jkca.2018.18.05.367
  4. CNN-LSTM 조합모델을 이용한 영화리뷰 감성분석 vol.25, pp.4, 2019, https://doi.org/10.13088/jiis.2019.25.4.141
  5. 텍스트 마이닝을 활용한 Youtube 광고에 대한 소비자 인식 분석 vol.39, pp.2, 2016, https://doi.org/10.29214/damis.2020.39.2.011
  6. 영화 스토리와 관객 감성반응과의 상관성에 대한 연구 vol.21, pp.7, 2021, https://doi.org/10.5392/jkca.2021.21.07.013