DOI QR코드

DOI QR Code

A Study of 'Emotion Trigger' by Text Mining Techniques

텍스트 마이닝을 이용한 감정 유발 요인 'Emotion Trigger'에 관한 연구

  • An, Juyoung (Department of Library and Information Science, College of Liberal Arts, Yonsei University) ;
  • Bae, Junghwan (Department of Library and Information Science, College of Liberal Arts, Yonsei University) ;
  • Han, Namgi (Department of Library and Information Science, College of Liberal Arts, Yonsei University) ;
  • Song, Min (Department. of Library and Information Science, Yonsei University)
  • 안주영 (연세대학교 문과대학 문헌정보학과) ;
  • 배정환 (연세대학교 문과대학 문헌정보학과) ;
  • 한남기 (연세대학교 문과대학 문헌정보학과) ;
  • 송민 (연세대학교 문과대학 문헌정보학과)
  • Received : 2015.06.05
  • Accepted : 2015.06.18
  • Published : 2015.06.30

Abstract

The explosion of social media data has led to apply text-mining techniques to analyze big social media data in a more rigorous manner. Even if social media text analysis algorithms were improved, previous approaches to social media text analysis have some limitations. In the field of sentiment analysis of social media written in Korean, there are two typical approaches. One is the linguistic approach using machine learning, which is the most common approach. Some studies have been conducted by adding grammatical factors to feature sets for training classification model. The other approach adopts the semantic analysis method to sentiment analysis, but this approach is mainly applied to English texts. To overcome these limitations, this study applies the Word2Vec algorithm which is an extension of the neural network algorithms to deal with more extensive semantic features that were underestimated in existing sentiment analysis. The result from adopting the Word2Vec algorithm is compared to the result from co-occurrence analysis to identify the difference between two approaches. The results show that the distribution related word extracted by Word2Vec algorithm in that the words represent some emotion about the keyword used are three times more than extracted by co-occurrence analysis. The reason of the difference between two results comes from Word2Vec's semantic features vectorization. Therefore, it is possible to say that Word2Vec algorithm is able to catch the hidden related words which have not been found in traditional analysis. In addition, Part Of Speech (POS) tagging for Korean is used to detect adjective as "emotional word" in Korean. In addition, the emotion words extracted from the text are converted into word vector by the Word2Vec algorithm to find related words. Among these related words, noun words are selected because each word of them would have causal relationship with "emotional word" in the sentence. The process of extracting these trigger factor of emotional word is named "Emotion Trigger" in this study. As a case study, the datasets used in the study are collected by searching using three keywords: professor, prosecutor, and doctor in that these keywords contain rich public emotion and opinion. Advanced data collecting was conducted to select secondary keywords for data gathering. The secondary keywords for each keyword used to gather the data to be used in actual analysis are followed: Professor (sexual assault, misappropriation of research money, recruitment irregularities, polifessor), Doctor (Shin hae-chul sky hospital, drinking and plastic surgery, rebate) Prosecutor (lewd behavior, sponsor). The size of the text data is about to 100,000(Professor: 25720, Doctor: 35110, Prosecutor: 43225) and the data are gathered from news, blog, and twitter to reflect various level of public emotion into text data analysis. As a visualization method, Gephi (http://gephi.github.io) was used and every program used in text processing and analysis are java coding. The contributions of this study are as follows: First, different approaches for sentiment analysis are integrated to overcome the limitations of existing approaches. Secondly, finding Emotion Trigger can detect the hidden connections to public emotion which existing method cannot detect. Finally, the approach used in this study could be generalized regardless of types of text data. The limitation of this study is that it is hard to say the word extracted by Emotion Trigger processing has significantly causal relationship with emotional word in a sentence. The future study will be conducted to clarify the causal relationship between emotional words and the words extracted by Emotion Trigger by comparing with the relationships manually tagged. Furthermore, the text data used in Emotion Trigger are twitter, so the data have a number of distinct features which we did not deal with in this study. These features will be considered in further study.

최근 소셜 미디어의 사용이 폭발적으로 증가함에 따라 이용자가 직접 생성하는 방대한 데이터를 분석하기 위한 다양한 텍스트 마이닝(text mining) 기법들에 대한 연구가 활발히 이루어지고 있다. 이에 따라 텍스트 분석을 위한 알고리듬(algorithm)의 정확도와 수준 역시 높아지고 있으나, 특히 감성 분석(sentimental analysis)의 영역에서 언어의 문법적 요소만을 적용하는데 그쳐 화용론적 의미론적 요소를 고려하지 못한다는 한계를 지닌다. 본 연구는 이러한 한계를 보완하기 위해 기존의 알고리듬 보다 의미 자질을 폭 넓게 고려할 수 있는 Word2Vec 기법을 적용하였다. 또한 한국어 품사 중 형용사를 감정을 표현하는 '감정어휘'로 분류하고, Word2Vec 모델을 통해 추출된 감정어휘의 연관어 중 명사를 해당 감정을 유발하는 요인이라고 정의하여 이 전체 과정을 'Emotion Trigger'라 명명하였다. 본 연구는 사례 연구(case study)로 사회적 이슈가 된 세 직업군(교수, 검사, 의사)의 특정 사건들을 연구 대상으로 선정하고, 이 사건들에 대한 대중들의 인식에 대해 분석하고자 한다. 특정 사건들에 대한 일반 여론과 직접적으로 표출된 개인 의견 모두를 고려하기 위하여 뉴스(news), 블로그(blog), 트위터(twitter)를 데이터 수집 대상으로 선정하였고, 수집된 데이터는 유의미한 연구 결과를 보여줄 수 있을 정도로 그 규모가 크며, 추후 다양한 연구가 가능한 시계열(time series) 데이터이다. 본 연구의 의의는 키워드(keyword)간의 관계를 밝힘에 있어, 기존 감성 분석의 한계를 극복하기 위해 Word2Vec 기법을 적용하여 의미론적 요소를 결합했다는 점이다. 그 과정에서 감정을 유발하는 Emotion Trigger를 찾아낼 수 있었으며, 이는 사회적 이슈에 대한 일반 대중의 반응을 파악하고, 그 원인을 찾아 사회적 문제를 해결하는데 도움이 될 수 있을 것이다.

Keywords

References

  1. An, J. K. and H. W. Kim, "Building a Korean Sentiment Dictionary and Applications of Natural Language Processing," Korea Intelligent Information System Society, (2014), 177-182.
  2. Choi, S. J., and O. B. Kwon, "The Study of Developing Korean SentiWordNet for Big Data Analytics - Focusing on Anger Emotion -", The Journal of Society for e-Business Studies, Vol. 19, No. 4(2014), 1-19. https://doi.org/10.7838/JSEBS.2014.19.4.001
  3. Go, A., R. Bhayani, and L. Huang, "Twitter sentiment classification using distant supervision", CS224N Project Report, Stanford, 2009, 1-12.
  4. Hamouda, A., and M. Rohaim, "Reviews classification using sentiwordnet lexicon," World Congress on Computer Science and Information Technology, 2011.
  5. Harris, Zellig S., "Distributional structure," Word, 1954.
  6. Hong, S. R., Y. O. Jeong, and J. H. Lee, "Semi-supervised learning for sentiment analysis in mass social media," Journal of Korean Institute of Intelligent Systems, Vol. 24, No. 5(2014), 482-488. https://doi.org/10.5391/JKIIS.2014.24.5.482
  7. Hung, C. and H. K. Lin, "Using objective words in SentiWordNet to improve word-of-mouth sentiment classification," IEEE Intelligent Systems, Vol. 28, No. 2(2013), 47-54. https://doi.org/10.1109/MIS.2013.1
  8. Jang, H. J., "Classification System for Emotional Verbs and Adjectives," Korea Society for Information Management, (2001), 29-34.
  9. Jang, K. A., S. H. Park, and W. J. Kim, "Automatic Construction of a Negative/positive Corpus and Emotional Classification using the Internet Emotional Sign," Korean Institute of Information Scientists and Engineers, Vol. 42, No. 4(2015), 512-521.
  10. Kang, H. H., S. J. Yoo, and D. H. Han, "Design and Implementation of System for Classifying Review of Product Attribute to Positive/ Negative," Korean Institute of Information Scientists and Engineers, 36(2C), (2009), 1-6.
  11. Kang, H. H., S. J. Yoo, and D. H. Han, "Sentilexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews," Expert Systems with Applications, Vol. 39, No. 5(2012), 6000-6010. https://doi.org/10.1016/j.eswa.2011.11.107
  12. Kim, J. O., S. S. Lee, and H. S. Yong, "Automatic Classification Scheme of Opinions Written in Korean," Korean Institute of Information Scientists and Engineers : Database, Vol. 38, No. 6(2011), 423-428.
  13. Kim, K. M. and J. H. Lee, "Sentiment Analysis of Twitter using Lexical Functional Information," Korean Institute of Information Scientists and Engineers, (2014), 734-736.
  14. Kim, S. W. and N. Kim, "A Study on the Effect of Using Sentiment Lexicon in Opinion Classification," Korea Intelligent Information System Society, (2013), 121-128.
  15. Kim, Y. S. and Y. H. Seo, "Journal of Korea Entertainment Industry Association," Korea Entertainment Industry Association, (2013), 206-210.
  16. Kouloumpis, E., T. Wilson, and J. Moore, "Twitter sentiment analysis: The good the bad and the omg!," ICWSM, Vol. 11(2011), 538-541.
  17. Lee, C. S., D. H. Choi, S. S. Kim, and J. W. Kang, "Classification and Analysis of Emotion in Korean Microblog Texts," Korean Institute of Information Scientists and Engineers,: Database, Vol. 40, No. 3(2013), 159-167.
  18. Liu, B., "Sentiment analysis and subjectivity," Handbook of natural language processing, Vol. 2(2010), 627-666.
  19. Mikolov, T., K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," 2013, arXiv preprint arXiv:1301.3781.
  20. Narayanan, R., B. Liu, and A. Choudhary, "Sentiment analysis of conditional sentences," Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Vol. 1(2009), 180-189.
  21. Ohana, B. and B. Tierney, "Sentiment classification of reviews using SentiWordNet," 9th. IT & T Conference, (2009), 13.
  22. Sadamitsu, K., S. Sekine, and M. Yamamoto, "Sentiment Analysis Based on Probabilistic Models Using Inter-Sentence Information," LREC, (2008).
  23. Saggiona, H., and A. Funk, "Interpreting Senti WordNet for opinion classification," Proceedings of the seventh conference on international language resources and evaluation LREC10, (2010), 1129-1133.
  24. Saif, H., Y. He, and H. Alani, "Alleviating data sparsity for twitter sentiment analysis," CEUR Workshop Proceedings (CEUR-WS. org), (2012), 2-9.
  25. Saif, H., Y. He, and H. Alani, "Semantic sentiment analysis of twitter," The Semantic Web-ISWC 2012, 2012b, 508-524.
  26. Seo, J. H., H. J. Cho, and J. T. Choi, "Design for Opinion Dictionary of Emotion Applying Rules for Antonym of the Korean Grammar," JKIIT, Vol. 13, No. 2( 2015), 109-117.
  27. Seo, J. R. and C. Ko, "Big Data Analysis by Sensitivity Analysis," Journal of The Society of Convergence Knowledge, Vol. 2, No. 1(2014), 15-21.
  28. Song, J. S., S. W. Lee, "Automatic Construction of Positive/Negative Feature-Predicate Dictionary for Polarity Classification of Product Reviews," Korean Institute of Information Scientists and Engineers : Software and Application, Vol. 38, No. 3(2011), 157-168.

Cited by

  1. Public Sentiment Analysis of Korean Top-10 Companies: Big Data Approach Using Multi-categorical Sentiment Lexicon vol.22, pp.3, 2016, https://doi.org/10.13088/jiis.2016.22.3.045
  2. A Method for Evaluating News Value based on Supply and Demand of Information Using Text Analysis vol.22, pp.4, 2016, https://doi.org/10.13088/jiis.2016.22.4.045
  3. CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로 vol.24, pp.2, 2018, https://doi.org/10.13088/jiis.2018.24.2.059
  4. How has the Republic of Korea viewed ‘North Korea’ and ‘Reunification’ over the past 20 years? vol.64, pp.6, 2020, https://doi.org/10.20879/kjjcs.2020.64.6.005