DOI QR코드

DOI QR Code

Informal Quality Data Analysis via Sentimental analysis and Word2vec method

감성분석과 Word2vec을 이용한 비정형 품질 데이터 분석

  • Lee, Chinuk (Department of Industrial Engineering Hanyang University) ;
  • Yoo, Kook Hyun (Department of Mathematics, Hanyang University) ;
  • Mun, Byeong Min (Department of Industrial Engineering Hanyang University) ;
  • Bae, Suk Joo (Department of Industrial Engineering Hanyang University)
  • Received : 2016.03.06
  • Accepted : 2017.03.22
  • Published : 2017.03.31

Abstract

Purpose: This study analyzes automobile quality review data to develop alternative analytical method of informal data. Existing methods to analyze informal data are based mainly on the frequency of informal data, however, this research tries to use correlation information of each informal data. Method: After sentimental analysis to acquire the user information for automobile products, three classification methods, that is, $na{\ddot{i}}ve$ Bayes, random forest, and support vector machine, were employed to accurately classify the informal user opinions with respect to automobile qualities. Additionally, Word2vec was applied to discover correlated information about informal data. Result: As applicative results of three classification methods, random forest method shows most effective results compared to the other classification methods. Word2vec method manages to discover closest relevant data with automobile components. Conclusion: The proposed method shows its effectiveness in terms of accuracy and sensitivity on the analysis of informal quality data, however, only two sentiments (positive or negative) can be categorized due to human errors. Further studies are required to derive more sentiments to accurately classify informal quality data. Word2vec method also shows comparative results to discover the relevance of components precisely.

Keywords

References

  1. Eun Ji Yu, Yoo Sin Kim, Nam Gyu Kim, and Seung Ryul Jeong. 2013. "Predicting the direction of the stock index by using a domain-specific sentiment dictionary." Journal of Intelligence and Information Systems 19(1):95-110. https://doi.org/10.13088/jiis.2013.19.1.095
  2. Pang Ning Tang, Michael Stenbach, and Vipin Kumar. 2006. Introduction To Data Mining. Addison-Wesley Longman Publishing Co., Inc.
  3. Quoc Le, Tomas Mikolov. 2014. "Distributed representations of Sentences and Documents." Proceedings of the 31st international conference on machine learning, 1188-1136.
  4. Sung-Jick Lee, and Han-Joon Kim. 2009. "Keyword extraction from news corpus using modified TF-IDF." The Journal of Society for e-Business Studies 14(4):59-73.
  5. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2003. "Efficient estimation of word representations in vector space." Proceedings in International Conference on learning representations 2013.
  6. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. "Distributed representation of words and phrases and their compositionality." Proceedings in International conference on neural information processing systems, 3111-3119.
  7. Yoo Sin Kim, Nam Gyu Kim, and Seung Ryul Jeong. 2011. "Stock-index invest model using news big data opinion mining." Journal of Intelligence and Information Systems. Volume 18(2):143-156. https://doi.org/10.13088/JIIS.2012.18.2.143
  8. Yuen-Hsien Tseng, Chi-Jen Lin, and Yu-I Lin. 2007. "Text mining techniques for patent analysis." Information processing and management 43(5):1216-1247. https://doi.org/10.1016/j.ipm.2006.11.011
  9. Yean Ran Lee, Eun Ju Yoon, Jung Ah Im, Young Hwan Lim, and Jung Hwan Sung. 2013. "Emotional tree using sensitivity image analysis algorithm." Journal of the Korea Contents Association 13(11):562-570. https://doi.org/10.5392/JKCA.2013.13.11.562
  10. Zhou Yong, Li Youwen, and Xia Shixiong. 2009. "An improved KNN text classification algorithm based on clustering." The Journal of Computers 4(3):230-237.