DOI QR코드

DOI QR Code

Reliability Analysis of VOC Data for Opinion Mining

오피니언 마이닝을 위한 VOC 데이타의 신뢰성 분석

  • Kim, Dongwon (Shipping Management, Korea Maritime & Ocean University) ;
  • Yu, Song Jin (Shipping Management, Korea Maritime & Ocean University)
  • 김동원 (한국해양대학교 해운경영학부) ;
  • 유성진 (한국해양대학교 해운경영학부)
  • Received : 2016.08.17
  • Accepted : 2016.10.25
  • Published : 2016.12.31

Abstract

The purpose of this study is to verify how 7 sentiment domains extracted through sentiment analysis from social media have an influence on business performance. It consists of three phases. In phase I, we constructed the sentiment lexicon after crawling 45,447 pieces of VOC (Voice of the Customer) on 26 auto companies from the car community and extracting the POS information and built a seven-sensitive domains. In phase II, in order to retain the reliability of experimental data, we examined auto-correlation analysis and PCA. In phase III, we investigated how 7 domains impact on the market share of three major (GM, FCA, and VOLKSWAGEN) auto companies by using linear regression analysis. The findings from the auto-correlation analysis proved auto-correlation and the sequence of the sentiments, and the results from PCA reported the 7 sentiments connected with positivity, negativity and neutrality. As a result of linear regression analysis on model 1, we indentified that the sentimental factors have a significant influence on the actual market share. In particular, not only posotive and negative sentiment domains, but neutral sentiment had significantly impacted on auto market share. As we apply the availability of data to the market, and take advantage of auto-correlation of the market-related information and the sentiment, the findings will be a huge contribution to other researches on sentiment analysis as well as actual business performances in various ways.

이 연구의 목적은 소셜 미디어에서 추출된 7개의 감성 도메인이 기업의 성과에 대한 영향 분석실험을 위한 데이터로서 적합한 지에 대해 신뢰성을 확인하고, 실제 고객감성이 자동차 시장점유율에 어떠한 영향을 미치는 지에 대하여 확인하기 위한 것이다. 본 연구는 총 3단계 구성으로서, 단계 1은 감성사전 구축 단계로서 미국 내 26개의 자동차 제조 회사의 고객의 소리 (VOC: Voice of Customer) 총 45,447개를 자동차 커뮤니티로부터 crawling하여 POS 정보 추출 후 감성사전을 구축하였고, 7개의 감성도메인을 만들었다. 단계 2는 신뢰성분석의 단계로서 자기상관관계분석과 주성분 분석 (PCA)을 통해 데이터의 실험 적합성을 검증하였다. 단계 3에서는 PCA를 근거로 2개의 선형회귀분석 모델을 구축하였고 GM, FCA, VOLKSWAGEN 등 3개의 기업을 선정, 2013년부터 2015년까지 7개 감성영역의 자동차 시장점유율에 대한 영향을 실험하였다. 실험 결과, 자기상관관계분석에 의해서 감성 데이터에 자기상관성과 시계열적 패턴이 관찰되었다. PCA 결과, 감성영역이 부정성, 긍정성, 중립성을 주성분으로 연결되어 있음이 확인되었다. VOC 감성 데이터에 대한 신뢰성을 바탕으로 한 2개 Model의 선형회귀분석 결과, 기업마다 시장점유율에 유의미한 영향을 미치는 감성들이 존재하며 Model 1과, 2의 감성영향력이 차이가 있고 중립성의 영향을 발견하였다. 본 연구를 통해, 데이터 상에 나타난 정보를 가진 감성이 과거 값에 기초하여 자동차 시장에서 변화를 수반할 수 있다는 것을 나타내고 있음을 확인하였다. 또한, 우리가 시장 데이터의 가용성을 적용하려고 할 때, 자동차 시장 관련 정보나 감성의 자기상관성을 잘 활용할 수 있다면, 감정 분석에 대한 연구에 큰 기여를 할 수 있을 뿐만 아니라, 실제 시장에서의 비지니스 성과에도 다양한 방법으로 기여할 수 있을 것으로 기대된다.

Keywords

References

  1. Abdi. H., & Williams, L.J, "Principal Component Analysis". Wiley Interdisciplinary Reviews: Computational Statistics, Vol.2, No.4(2010), 433-459. https://doi.org/10.1002/wics.101
  2. A. Esuli and F. Sebastiani, "Sentiwordnet: A Publicly Available Lexical Resource for Opinion Mining," LREC (2006), 417-422.
  3. An J. K. and H. W. Kim, "Building a Korean Sentiment Lexicon Using Collective Intelligence," Journal of Intelligent Information Systems, Vol.21, No.2(2015), 49-67. https://doi.org/10.13088/jiis.2015.21.2.49
  4. A. Woolridge, "Social media provides huge opportunities, but will bring huge problems," Economist, (2011), 50.
  5. B.J. Finch, "Internet Discussions as a Source for Consumer Product Customer Involvement and Quality Information: an Exploratory Study," Journal of Operations Management, Vol.17, No.5(1999), 535-556. https://doi.org/10.1016/S0272-6963(99)00005-4
  6. B. Kessler, G. Numberg, and H. Schutze, "Automatic Detection of Text Genre," Meeting of the Association for Computational Linguistics (1997), 32-38.
  7. B. Kujawski, J. Holyst, and G. J. Rodgers, "Growing Trees in Internet News Groups and Forums," Physical Review, Vol.76 (2007), 103.
  8. B. Pang, L. Lee, and S. Vaithyanathan, "Thumbs up?: Sentiment Classification Using Machine Learning Techniques," The ACL-02 conference on Empirical Methods in Natural Language Processing, Vol.10 (2002), 79-86.
  9. B. Tronvoll, "Negative Emotions and Their Effect on Customer Complaint Behavior," Journal of Service Management, Vol.22 (2011), 111-134. https://doi.org/10.1108/09564231111106947
  10. Choi, Y.-J. and H. Choi, "A Study on the Customer Satisfaction Strategies of the Online Company Using VOC," Journal of Korean Industrial Economics and Business, Vol.3, No.1(2011), 73-93.
  11. C. Whitelaw, N. Garg, and S. Argamon, "Using Appraisal Groups for Sentiment Analysis," The 14th ACM International Conference on Information and Knowledge Management, (2005), 625-631.
  12. David A. Freedman. "Statistical Models: Theory and Practice". Cambridge University Press, 2009, 26.
  13. D.E. O'Leary, "Blog Mining-Review and Extensions: from Each according to His Opinion," Decision Support Systems, Vol.51, No.4(2011), 821-830. https://doi.org/10.1016/j.dss.2011.01.016
  14. D. Ward, P.H. Jesty, R.S. Rivett. "Decomposition Scheme in Automotive Hazard Analysis," SAE International Journal of Passenger Cars- Mechanical Systems, Vol.2, No.1(2009), 803-813. https://doi.org/10.4271/2009-01-0745
  15. E. Diener, H. Smith, and F. Fujita, "The Personality Structure of Affect," Journal of Personality and Social Psychology, Vol.69(1995), 130. https://doi.org/10.1037/0022-3514.69.1.130
  16. E. Spertus, "Smokey: Automatic Recognition of Hostile Messages," The National Conference on Artificial Intelligence, (1997), 1058-1065.
  17. G. A. Miller, "WordNet: a Lexical Database for English," Communications of the ACM, Vol.38 (1995).
  18. Gerald M. Katz, "One Right Way to Gather the Voice of the Customer," PDMA Visions Magazine, (2001).
  19. G. M. Ljung; G. E. P. Box, "On a Measure of a Lack of Fit in Time Series Models," Biometrika, Vol.65, No.2(1978), 297-303. https://doi.org/10.1093/biomet/65.2.297
  20. Hanjun Lee, JinYoung Han, Yongmoo Suh, "Gift or Threat? An Examination of Voice of the Customer: The Case of MyStarbucksIdea.com," Electronic Commerce Research and Applications, Vol.13 (2014), 205-219. https://doi.org/10.1016/j.elerap.2014.02.001
  21. Hilary L. Seal."The Historical Development of the Gauss Linear Model", Biometrika, Vol.54, No.1/2(1967), 1-24. https://doi.org/10.1093/biomet/54.1-2.1
  22. Hyun Won Jung, Ken Nah, A Study on the Meaning of Sensibility and Vocabulary System for Sensibility Evaluation, Journal of the Ergonomics Society of Korea, Vol.26, No.3(2007), 17-25. https://doi.org/10.5143/JESK.2007.26.3.017
  23. J. Bollen, H. Mao, and X. Zeng, "Twitter Mood Predicts the Stock Market," Journal of Computational Science, Vol.2 (2011), 1-8. https://doi.org/10.1016/j.jocs.2010.12.007
  24. J. S. Lerner and D. Keltner, "Beyond valence: Toward a Model of Emotion-specific Influences on Judgment and Choice," Cognition & Emotion, Vol.14 (2000), 473-493. https://doi.org/10.1080/026999300402763
  25. Jo H. J., J. H. Seo and J. T. Choi, "OAR Algorithm Technology Based on Opinion Mining Utilizing Stock News Contents," Journal of Korean Institute of Information Technology, Vol.13, No.2(2015), 111-119.
  26. Jung, "The Influence of Negative Emotions on Customer Contribution to Organizational Innovation in an Online Brand Community," Journal of Korean Society for Internet Information, Vol.14, No.4(2013), 91-100
  27. K. Coussement, D. Van den Poel, "Improving Customer Complaint Management by Automatic Email Classification using Linguistic style Features as Predictors," Decision Support Systems, Vol.44, No.4 (2008), 870-882. https://doi.org/10.1016/j.dss.2007.10.010
  28. Kim, Y., N. Kim, and S. R. Jeong, "Stock-index Invest Model Using News Big Data Opinion Mining," Journal of Intelligence and Information Systems, Vol.18, No.2(2012), 143-156. https://doi.org/10.13088/JIIS.2012.18.2.143
  29. Liu, Bing, "Sentiment Analysis and Subjectivity," Handbook of Natural Language Processing 2, (2010), 627-666.
  30. L.. Venkata Subramaniam, Tanveer A. Faruquie, Shajith Ikbal, Shantanu Godbole, Mukesh K. Mohania, "Business Intelligence from Voice of Customer," IEEE International Conference on Data Engineering (2009).
  31. L. Zhuang, F. Jing, X. Y. Zhu, and L. Zhang, "Movie Review Mining and Summarization," Conference on Information and Knowledge Management: Proceedings of the 15 th ACM International Conference on Information and Knowledge Management, (2006), 43-50.
  32. M. Thelwall, K. Buckley, and G. Paltoglou, "Sentiment in Twitter Events," Journal of the American Society for Information Science and Technology, Vol.62 (2011), 406-418. https://doi.org/10.1002/asi.21462
  33. N.C. Romano, C. Donovan, H. Chen, J. Nunamaker, "A Methodology for Analyzing Web-based Qualitative Data," Journal of Management Information Systems, Vol.19(4) (2003), 213-246. https://doi.org/10.1080/07421222.2003.11045741
  34. N. Li and D. D. Wu, "Using Text Mining and Sentiment Analysis for Online Forums Hotspot Detection and Forecast," Decision Support Systems, Vol.48 (2010), 354-368. https://doi.org/10.1016/j.dss.2009.09.003
  35. P.C. Tetlock, M. Saar-Tsechansky, S. Macskassy, "More than Words: Quantifying Language to Measure Firms' Fundamentals," Journal of Finance, Vol.63, No.3(2008), 1437-1467 https://doi.org/10.1111/j.1540-6261.2008.01362.x
  36. P.H. Jesty, K.M. Hobley, R. Evans, I. Kendall, Safety Analysis of Vehicle-based Systems, in: F. Redmill, T. Anderson (Eds.). "Lessons in System Safety, Proceedings of the 8th Safety-Critical Systems Symposium (SCSS)," Springer, London, 2000.
  37. Pearson, K, "Onlines and Planes of Closest Fit to Systems of Points in Space," Philosophical Magazine, Vol.2, No.11(1901), 559-572. https://doi.org/10.1080/14786440109462720
  38. R.G. Vedder, M.T. Vanecek, C.S. Guynes, J.J. Cappel. "CEO and CIO Perspectives on Competitive Intelligence," Communications of the ACM, Vol.42, No.8(1999), 108-116. https://doi.org/10.1145/310930.310982
  39. R.P. Schumaker, H. Chen, "Textual Analysis of Stockmarket Prediction Using Breaking Financial News: the AZFin Text System," ACM Transactions on Information Systems, Vol.27, No.2(2009).
  40. S. Argamon, M. Koppel, and G. Avneri, "Routing Documents according to Style," First International Workshop on Innovative Information Systems, (1998), 85-92.
  41. Song J. S., and S. W. Lee, "Automatic Construction of Positive/Negative FeaturePredicate Dictionary for Polarity Classification of Product Reviews," Journal of KIISE: Software and Applications, Vol.38, No.3(2013), 157-168.
  42. S. Spangler, J. Kreulen, "Mining the Talk: Unlocking the Business Value in Unstructured Information," IBM Press, 2008.
  43. Takeuchi, H., L. V. Subramaniam., T. Nasukawa, S. Roy, "Getting Insights from the Voices of Customers : Conversation Mining at a Contact Center," Information Science, Vol.179, No.11(2009), 1584-1591. https://doi.org/10.1016/j.ins.2008.11.026
  44. Turney P. D. and M.L. Littman, "Unsupervised Learning of Semantic Orientation from a Hundred-Billion-word Corpus," National Research Council, Institute for Information Technology, Technical Report (2002), ERB-1094.
  45. T. Loughran, B. McDonald. "When is a Liability not a Liability? Textual Analysis Dictionaries, and 10-Ks," Journal of Finance, Vol.661, No.1(2011), 35-65.
  46. T. Nasukawa and J. Yi, "Sentiment Analysis: Capturing Favorability Using Natural Language Processing," The 2nd International Conference on Knowledge Capture, (2003), 70-77.
  47. T. Wilson, J. Wiebe, and P. Hoffmann, "Recognizing Contextual Polarity: An Exploration of Features for Phrase-level Sentiment Analysis," Computational Linguistics, Vol.35 (2009), 399-433. https://doi.org/10.1162/coli.08-012-R1-06-90
  48. W. Duan, B. Gu, A.B. Whinston, "Do online reviews matter? - An Epirical Investigation of Panel Data," Decision Support Systems, Vol.45, No.4(2008), 1007-1016. https://doi.org/10.1016/j.dss.2008.04.001
  49. Yu E. J., Y. S. Kim, N. Y. Kim and S. R. Jeong, "Predicting the Direction of the Stock Index by Using a Domain-specific Sentiment Dictionary," Journal of Intelligent Information Systems, Vol.19, No.1(2013), 95-10 https://doi.org/10.13088/jiis.2013.19.1.095
  50. Yune, H., H.-J. Kim, J.-Y. Chang, "An Efficient Search Method of Product Review Using Opinion Mining Techniques," Journal of KIISE : Computing Practices and Letters, Vol.16, No.2(2010), 222-226.
  51. Zhuang, L., F. Jing, and X. Y. Zhu, "Movie Review Mining and Summarization," Proceedings of the 15th ACM International Conference on Information and Knowledge Management, (2006), 43-50.

Cited by

  1. 한국표준산업분류를 기준으로 한 문서의 자동 분류 모델에 관한 연구 vol.24, pp.3, 2016, https://doi.org/10.13088/jiis.2018.24.3.221