DOI QR코드

DOI QR Code

User Sentiment Analysis on Amazon Fashion Product Review Using Word Embedding

워드 임베딩을 이용한 아마존 패션 상품 리뷰의 사용자 감성 분석

  • Lee, Dong-yub (Dept. of Computer Science and Engineering, Korea University) ;
  • Jo, Jae-Choon (Dept. of Computer Science and Engineering, Korea University) ;
  • Lim, Heui-Seok (Dept. of Computer Science and Engineering, Korea University)
  • Received : 2017.02.09
  • Accepted : 2017.04.20
  • Published : 2017.04.28

Abstract

In the modern society, the size of the fashion market is continuously increasing both overseas and domestic. When purchasing a product through e-commerce, the evaluation data for the product created by other consumers has an effect on the consumer's decision to purchase the product. By analysing the consumer's evaluation data on the product the company can reflect consumer's opinion which can leads to positive affect of performance to company. In this paper, we propose a method to construct a model to analyze user's sentiment using word embedding space formed by learning review data of amazon fashion products. Experiments were conducted by learning three SVM classifiers according to the number of positive and negative review data using the formed word embedding space which is formed by learning 5.7 million Amazon review data.. Experimental results showed the highest accuracy of 88.0% when learning SVM classifier using 50,000 positive review data and 50,000 negative review data.

현대 사회에서 패션 시장의 규모는 해외와 국내 모두 지속적으로 증가하고 있다. 전자상거래를 통해 상품을 구입하는 경우 다른 소비자들이 작성한 상품에 대한 평가 데이터는 소비자가 상품의 구입 여부를 결정하는데에 영향을 미친다. 기업의 입장에서도 상품에 대한 소비자의 평가 데이터를 분석하여 소비자의 피드백을 반영한다면 기업의 성과에 긍정적인 영향을 미칠 수 있다. 이에 본 논문에서는 아마존 패션 상품의 리뷰 데이터를 학습하여 형성된 워드임베딩 공간을 이용하여 사용자의 감성을 분석하는 모델을 구축하는 방법을 제안한다. 실험은 아마존 리뷰 데이터 570만건을 학습하여 형성된 워드임베딩 공간을 이용하여 긍정, 부정 리뷰 데이터의 개수에 따라 총 3개의 SVM 분류기 모델을 학습하는 방식으로 진행하였다. 실험 결과 긍정 리뷰 데이터 5만건, 부정 리뷰데이터 5만건을 이용하여 SVM 분류기를 학습하였을 때 88.0%로 가장 높은 정확도(accuracy)를 나타냈다.

Keywords

References

  1. S. Y. Jo. (2016, August 16). Fashion Journal &Textile Life. "Forecast of domestic fashion market in 2016". Retrieved January,16 2017 from http://okfashion.co.kr/print_paper.php?number=44812&news_article=nm_news_article&target=print_paper
  2. W. James C. and A. L. Ostrom. "The Internet as information minefield: An analysis of the source and content of brand information yielded by net searches". In Journal of Business Research, 56(11), 907-14, 2003. https://doi.org/10.1016/S0148-2963(01)00277-6
  3. B. Bickart and R. M. Schindler. "Internet forums as influential sources of consumer information". In Journal of Interactive Marketing, pp. 314 40, 2001.
  4. J. A. Chevalier and D. Mayzlin. "The Effect of Word of Mouth on Sales: Online Book Reviews". In NBER working paper, 2003
  5. J. S. Kim. "Emotion Prediction of Document using Paragraph Analysis". In Journal of Digital Convergence, pp.249-255, 2014
  6. J. S. Kim. "Emotion Prediction of Paragraph using Big Data Analysis". In Journal of Digital Convergence, pp.267-273, 2016
  7. L. Zaho and C. Li."Ontology Based Opinion Mining for Movie Reviews". In Springer, 2009.
  8. P. Baranikumar and N. Gobi. "Feature Extraction of Opinion Mining Using Ontology". In International Journal of Advances in Computer and Electronics Engineering,1,(pp. 18-22), 2016.
  9. B. Xue et al."A study on sentiment computing and classification of Sina Weibo with Word2vec". In IEEE Int. Cong on Big Data, pp. 358-363, 2014.
  10. Zhang, D., Xu, H., Su, Z., & Xu, Y. "Chinese comments sentiment classification based on word2vec and svm perf". Expert Systems with Applications, 42(4), 1857-1863. 2016. https://doi.org/10.1016/j.eswa.2014.09.011
  11. Niu, T., Zhu, S., Pang, L., & El Saddik, "A. Sentiment analysis on multi-view social data". In Multimedia modeling (pp. 15-27) at Springer, 2016.
  12. Matsumoto, S., Takamura, H., & Okumura, M. "Sentiment classification using word sub-sequences and dependency sub-trees". In Advances in knowledge discovery and data mining (pp. 301-311) at Springer, 2005.
  13. Tripathy, A., Agrawal, A., & Rath, S. K. "Classification of sentiment reviews using n-gram machine learning approach". Expert Systems with Applications, 57, 117-126. 2016. https://doi.org/10.1016/j.eswa.2016.03.028
  14. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. "Efficient Estimation of Word Representations in Vector Space". In Proceedings of Workshop at ICLR, 2013.
  15. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. "Distributed Representations of Words and Phrases and their Compositionality". In Proceedings of NIPS, 2013.
  16. J. McAuley, C. Targett, Q. Shi, and A. van den Hengel. "Image-based recommendations on style and substitutes". Proceedings of the 38st annual international ACM SIGIR conference., 2015.
  17. J. J. McAuley, R. Pandey, and J. Leskovec. "Inferring networks of substituable and complementary products". In KDD, 2015.
  18. R. K. Bayot and T.Goncalves. "Author profiling using svms and word embedding averages". In CLEF, 2016.
  19. Y. Belinkov, M. Mohtarami, S. Cyphers, and J. Glass. "VectorSLU: A continuous word vector approach to answer selection in community question answering systems". In Proceedings of the 9th International Workshop on Semantic Evaluation, SemEval, 2015.