DOI QR코드

DOI QR Code

Unstructured Data Quantification Scheme Based on Text Mining for User Feedback Extraction

사용자 의견 추출을 위한 텍스트 마이닝 기반 비정형 데이터 정량화 방안

  • 조중흠 (홍익대학교 산업공학과) ;
  • 정용택 (홍익대학교 산업공학과) ;
  • 최성욱 (홍익대학교 산업공학과) ;
  • 옥창수 (홍익대학교 산업공학과)
  • Received : 2018.11.12
  • Accepted : 2018.12.10
  • Published : 2018.12.31

Abstract

People write reviews of numerous products or services on the Internet, in their blogs or community bulletin boards. These unstructured data contain important emotions and opinions about the author's product or service, which can provide important information for future product design or marketing. However, this text-based information cannot be evaluated quantitatively, and thus they are difficult to apply to mathematical models or optimization problems for product design and improvement. Therefore, this study proposes a method to quantitatively extract user's opinion or preference about a specific product or service by utilizing a lot of text-based information existing on the Internet or online. The extracted unstructured text information is decomposed into basic unit words, and positive rate is evaluated by using existing emotional dictionaries and additional lists proposed in this study. This can be a way to effectively utilize unstructured text data, which is being generated and stored in vast quantities, in product or service design. Finally, to verify the effectiveness of the proposed method, a case study was conducted using movie review data retrieved from a portal website. By comparing the positive rates calculated by the proposed framework with user ratings for movies, a guideline on text mining based evaluation of unstructured data is provided.

Keywords

References

  1. Aggarwal, C.C. and Zhai, C.X., Mining Text Data, New York, Springer, 2012, pp. 11-35.
  2. Chang J., A Sentiment Analysis Algorithm for Automatic Product Reviews Classification in On-Line Shopping Mall, Journal of Society for e-Business Studies, 2009, Vol. 14, No. 4, pp. 19-33.
  3. Das, T.K. and Kumar, P.M., Big data analytics : A framework for unstructured data analysis, International Journal of Engineering Technology, 2013, Vol. 5, No. 1, pp. 153-156. https://doi.org/10.7763/ijet.2013.v5.531
  4. Gantz, J. and Reinsel, D., The digital universe in 2020 : Big data, bigger digital shadows, and biggest growth in the far east, IDC iView : IDC Anal. Future, 2012, Vol. 2007, pp. 1-16.
  5. Ghose, A. and Ipeirotis, P.G., Estimating the Helpfulness and Economic Impact of Product Reviews : Mining Text and Reviewer Characteristics, IEEE Transactions on Knowledge and Data Engineering, 2011, Vol. 23, No. 10, pp. 1498-1512. https://doi.org/10.1109/TKDE.2010.188
  6. Hu, M. and Liu, B., Mining and summarizing customer reviews, '04 Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, 2004, Washington, USA, pp. 168-177.
  7. Kam, M. and Song, M., A Study on Differences of Contents and Tones of Arguments among Newspapers Using Text Mining Analysis, Journal of Intelligence and Information System, 2012, Vol. 18, No. 3, pp. 53-77. https://doi.org/10.13088/JIIS.2012.18.3.053
  8. Kim, K.A. and Ku, J.H., A Study on the Potential and Limitation of Pre-producing Dramas through Social Analysis-focusing on a jtbc drama , Journal of the Korea Academia-Industrial cooperation Society, 2018, Vol. 19, No. 2, pp. 164-172. https://doi.org/10.5762/KAIS.2018.19.2.164
  9. Kim, K.H., Chae, M.S., and Lee, B.T., Text Mining-Based Emerging Trend Analysis for e-Learning Contents Targeting for CEO, Information Systems Review, 2016, Vol. 19, pp. 2-4.
  10. Kim, S., Introduction to Statistics, Seoul, Hakjisa, 2007, pp. 96-97.
  11. Laudauer, T.K., Foltz, P.W., and Laham, D., An Introduction to Latent Semantic Analysis, Journal Discourse Processes, 1998, Vol. 25, No. 2-3, pp. 259-284. https://doi.org/10.1080/01638539809545028
  12. Le, Q.V. and Mikolov, T., Distributed Representations of Sentences and Documents, Proceedings of the 31st International Conference on International Conference on Machine Learning, Beijing China, 2014, Vol. 32, pp. 1188-1196.
  13. Tan, A., Text Mining : The state of the art and the challenges, In Proceedings of the PAKDD 1999 Workshop on Knowledge Disocovery from Advanced Databases, 1999, pp. 65-70.
  14. Wikidipia, https://ko.wikipedia.org/wiki/%EB% 8C%80%ED%95%9C%EB%AF%BC%EA%B5%AD%EC%9D%98_%EC%9D%B8%ED%84%B0%EB%84%B7_%EC%8B%A0%EC%A1%B0%EC%96%B4_%EB%AA%A9%EB%A1%9D(accessed on 11 November, 2018).
  15. Yoon, J., Song, J., and Ryu, T., Quantifying the Process of Patent Right Quality Evaluation : Combined Application of AHP, Text Mining and Regression Analysis, Journal of Society of Korea Industrial and Systems Engineering, 2015, Vol. 38, No. 2, pp. 17-30. https://doi.org/10.11627/jkise.2015.38.2.17

Cited by

  1. 인문사회 과학기술 분야 연구의 학제적 동향 분석 : 토픽 모델링과 네트워크 분석의 활용 vol.42, pp.1, 2019, https://doi.org/10.11627/jkise.2019.42.1.074
  2. 텍스트마이닝 방법론을 활용한 웨어러블 관련 키워드의 트렌드 분석 vol.18, pp.9, 2020, https://doi.org/10.14400/jdc.2020.18.9.181
  3. 패션 트렌트(2010~2019)의 주요 요소로서 소재 - 텍스트마이닝을 통한 분석 - vol.22, pp.5, 2018, https://doi.org/10.5805/sfti.2020.22.5.551