DOI QR코드

DOI QR Code

The Construction of a Domain-Specific Sentiment Dictionary Using Graph-based Semi-supervised Learning Method

그래프 기반 준지도 학습 방법을 이용한 특정분야 감성사전 구축

  • Kim, Jung-Ho (Department of Computer Engineering, Korea Aerospace University) ;
  • Oh, Yean-Ju (Department of Computer Engineering, Korea Aerospace University) ;
  • Chae, Soo-Hoan (The School of Electronics and Telecommunication, Korea Aerospace University)
  • 김정호 (한국항공대학교 컴퓨터공학과) ;
  • 오연주 (한국항공대학교 컴퓨터공학과) ;
  • 채수환 (한국항공대학교 전자 및 정보통신공학부)
  • Received : 2014.08.25
  • Accepted : 2015.02.10
  • Published : 2015.03.30

Abstract

Sentiment lexicon is an essential element for expressing sentiment on a text or recognizing sentiment from a text. We propose a graph-based semi-supervised learning method to construct a sentiment dictionary as sentiment lexicon set. In particular, we focus on the construction of domain-specific sentiment dictionary. The proposed method makes up a graph according to lexicons and proximity among lexicons, and sentiments of some lexicons which already know their sentiment values are propagated throughout all of the lexicons on the graph. There are two typical types of the sentiment lexicon, sentiment words and sentiment phrase, and we construct a sentiment dictionary by creating each graph of them and infer sentiment of all sentiment lexicons. In order to verify our proposed method, we constructed a sentiment dictionary specific to the movie domain, and conducted sentiment classification experiments with it. As a result, it have been shown that the classification performance using the sentiment dictionary is better than the other using typical general-purpose sentiment dictionary.

감성어휘는 텍스트로 감성을 표현하거나, 반대로 텍스트로부터 감성을 인식하기 위한 특징으로써 감성분류 연구에 필수요소이다. 본 연구는 감성어휘의 집합인 감성사전을 자동으로 구축하는 그래프 기반 준지도 학습 방법을 제안한다. 특히 감성어휘가 사용되어지는 분야에 따라 그 감성이 변하는 중의성 문제를 고려하여 분야 별 감성사전을 구축하고자 한다. 제안하는 방법은 어휘와 어휘들 간의 밀접도를 토대로 그래프를 구성하고, 사전에 학습 된 일부 소량의 감성어휘들의 감성을 구성된 그래프 전체에 전파하는 방식으로 모든 어휘의 감성을 추론한다. 감성어휘는 대표적으로 감성단어와 감성구문이 있으며, 본 연구에서는 이들 각각에 대한 그래프를 구성하고 감성을 추론하여 전체 감성사전을 구축하였다. 제안하는 방법의 성능을 검증하기 위해 영화평 분야의 감성사전을 구축하고, 이를 이용한 영화평 감성분류 실험을 수행하였다. 그 결과 기존 범용 감성사전의 어휘들을 이용한 감성분류보다 더 높은 분류 성능을 확인하였다.

Keywords

References

  1. Baccianella, S., Esuli, A., & Sebastiani, F. (2010). SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. Paper presented at the Seventh conference on International Language Resources and Evaluation.
  2. Hu, M. & Liu, B. (2004). Mining and summarizing customer reviews, Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 168-177.
  3. Kim, S. M. & Hovy, E. (2004). Determining the sentiment of opinions, Proceedings of the International Conference on Computational Linguistics, 1367-1373.
  4. Dragut, E. C., Yu, C., Sistla, P. & Meng, W. (2010). Construction of a sentimental word dictionary, In Proceedings of ACM International Conference on Information and Knowledge Management, 1761-1764.
  5. Mohammad, S., Dunne, C. & Dorr, B. (2009). Generating highcoverage semantic orientation lexicons from overtly marked words and a thesaurus. in Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 599-608.
  6. Blair-Goldensohn, S., Hannan, K., McDonald, R., Neylon, T., Reis, G. A. & Reynar, J. (2008). Building a sentiment summarizer for local service reviews. in Proceedings of WWW-2008 workshop on NLP in the Information Explosion Era.
  7. Rao, D. & Ravichandran, D. (2009). Semi-supervised polarity lexicon induction. in Proceedings of the 12th Conference of the European Chapter of the ACL, 675-682.
  8. Hassan, A., Qazvinian, V. & Radev, D. (2010). What's with the attitude?: identifying sentences with attitude in online discussions. in Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 1245-1255.
  9. Hatzivassiloglou, V. & McKeown, K. R. (1997). Predicting the semantic orientation of adjectives, Proceedings of the Joint ACL/EACLConference, 174-181.
  10. Qiu, G., Liu, B., Bu, J. & Chen, C. (2009). Expanding Domain Sentiment Lexicon through Double Propagation, International Joint Conference on Artificial Intelligence, 1199-1204.
  11. Tai, Y. J. & Kao, H. Y. (2013). Automatic Domain-Specific Sentiment Lexicon Generation with Label Propagation. In Proceedings of International Conference on Information Integration and Web-based Applications & Services, 53-62.
  12. Turney, P. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews, Proceedings of the Association for Computational Linguistics, 417-424.
  13. Zhu, X. & Ghahramani, Z. (2002). Learning from labeled and unlabeled data with label propagation, School Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, Tech. Rep. CMU-CALD-02-107.
  14. Esuli, A. & Sebastiani, F. (2005). Determining the semantic orientation of terms through gloss analysis, Proceedings of the ACM Conference on Information and Knowledge Management, 617-624.
  15. Kamps, J., Marx, M., Mokken, R. J. & Rijke, M. D. (2004). Using WordNet to measure semantic orientation of adjectives. In Proceeding of 4th International Conference on Language Resources and Evaluation, 1115-1118.
  16. Andreevskaia, A. & Bergler, S. (2006). Mining WordNet for a fuzzy sentiment: Sentiment tag extraction from WordNet glosses, Proceedings of the European Chapter of the Association for Computational Linguistics, 209-216.