DOI QR코드

DOI QR Code

Feature Weighting for Opinion Classification of Comments on News Articles

뉴스 댓글의 감정 분류를 위한 자질 가중치 설정

  • 이공주 (충남대학교 정보통신공학과) ;
  • 김재훈 (한국해양대학교 컴퓨터공학과) ;
  • 서형원 (한국해양대학교 컴퓨터공학과) ;
  • 류길수 (한국해양대학교 컴퓨터공학과)
  • Received : 2010.07.01
  • Accepted : 2010.08.31
  • Published : 2010.09.30

Abstract

In this paper, we present a system that classifies comments on a news article into a user opinion called a polarity (positive or negative). The system is a kind of document classification system for comments and is based on machine learning techniques like support vector machine. Unlike normal documents, comments have their body that can influence classifying their opinions as polarities. In this paper, we propose a feature weighting scheme using such characteristics of comments and several resources for opinion classification. Through our experiments, the weighting scheme have turned out to be useful for opinion classification in comments on Korean news articles. Also Korean character n-grams (bigram or trigram) have been revealed to be helpful for opinion classification in comments including lots of Internet words or typos. In the future, we will apply this scheme to opinion analysis of comments of product reviews as well as news articles.

본 논문은 뉴스 기사의 댓글에 대한 사용자의 감정을 분류하는 시스템을 제안한다. 제안된 시스템은 댓글의 문서 분류 시스템으로 기계학습에 기반을 두고 있다. 댓글은 일반적인 문서와 달리 본문을 가지고 있으며 본문의 내용이 독자의 감정에 영향을 줄 수 있다. 본 논문에서는 이와 같은 댓글의 특성과 여러 가지 자원을 이용하여 감정 분류를 위한 자질을 제안하고 이들의 가중치 설정 방법을 제안한다. 실험을 통해 이러한 가중치 설정 방법이 한글 뉴스의 댓글에 대한 감정을 분류하는데 효과적임을 알 수 있었다. 또한 댓글과 같이 많은 오류를 포함하는 문서에 대해서 문자 단위의 2음절과 3음절 자질도 충분히 이용 가치가 있음을 확인할 수 있었다. 향후에 뉴스 기사의 댓글뿐 아니라 상품 댓글 등 일반적인 감정 분석에 적용할 계획이다.

Keywords

References

  1. N. Godbole, M. Srinivasaiah, and S. Skiena, "Large-Scale Sentiment Analysis for News and Blogs", Proceedings of International Conference on Weblogs and Social Media, 2007.
  2. C. D. Manning, P. Raghavan and H. Schutze, Introduction to Information Retrieval, Cambridge University Press. 2008.
  3. B. Pang and L. Lee, "Opinion mining and sentiment analysis", Foundations and Trends in Information Retrieval, vol. 2, no. 1-2, pp. 1-135, 2008. https://doi.org/10.1561/1500000011
  4. R. Tong, "An operational system for detecting and tracking opinions in on-line discussions", Working Notes of the SIGIR Workshop on Operational Text Classification, pp. 1-6, 2001.
  5. E. Spertus, "Somkey: Automatic recognition of hostile messages", Proceedings of the 5th International Conference on Intelligent User Interfaces, pp. 1058-1065, 1997.
  6. P. Turney, "Thumbs Up or thumbs down? Semantic orientation applied to unsupervised classification of reviews", Proceedings of ACL, pp. 417-424, 2002.
  7. K. Lin, C. Yang and H.-H. Chen, "What emotions do news articles trigger in their readers?", Proceedings of SIGIR, pp. 733-734, 2007.
  8. A. Devitt and K. Ahmad, "Sentiment polarity identification in financial news: A cohesion-based approach", Proceedings of the Annual Meeting of the Association of Computational Linguistics, pp. 984-991, 2007.
  9. A. Nourbakhsh, C. Khoo, and J.-C. Na, "A framework for sentiment analysis of political news articles", Proceedings of the International Communication Association Conference, 2008.
  10. J.-H. Kim, K. J. Lee, H.-W. Seo, and H.-C. Kim, "Opinion mining for comments on news articles on the Web", Proceedings of the International Conference on Internet, pp. 63-68, 2009.
  11. C. M. Tan and C. D. Lee, "The use of bigram to enhance text categorization", International Journal of Information Processing & Management, pp. 529-546, 2001.
  12. T. Bekkerman and J. Allan, Using Bigrams in Text Categorization, CIIT Technical Report IR-408, 2004.
  13. C. Kim and Y. Kim, "Statistical information of Korean dictionary to construct an enormous electronic dictionary", The Journal of Korean Contents Society, vol. 7, no. 6, pp. 60-68, 2007. https://doi.org/10.5392/JKCA.2007.7.6.060
  14. J. Lee, H. Park, J. Ahn and M. Kim, "An effective indexing methods for Korean text", Proceedings of the Korean Society for Information Management Conference, pp. 11-14, 1995.
  15. C. Jung, An Indexing Method Based on the Mixed n-gram for Korean Information Retrieval, Master Thesis, Department of Computer Engineering, Korea Maritime University, 2004.
  16. D. Rao and D. Ravichandran, "Semi-supervised polarity lexicon induction", Proceedings of the 12th Conference of the European Chapter of the ACL, pp. 675-682, 2009.
  17. A. Zheng and R Srihari, "Optimally combining positive and negative features for text categorization", Proceedings of the ICML Workshop on Learning from Imbalanced Datasets, 2003.
  18. Z. Zheng, X. Wu, and R. Srihari, "Feature selection for text categorization on imbalanced data", ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 80-89, 2004. https://doi.org/10.1145/1007730.1007741
  19. 홍진표, 차정원, "TextRank 알고리즘을 이용한 한국어 중요 문장 추출", 한국정보과학회 2009 한국컴퓨터종합학술대회 발표논문집, 제36권, 제1호(C), pp. 311-314, 2009.
  20. T. Wilson, J. Wiebe and P. Hoffmann, "Recognizing contextual polarity: An exploration of features for phrase-level sentiment analysis", Computational Linguistics, vol. 35, no. 3, pp. 399-433, 2009. https://doi.org/10.1162/coli.08-012-R1-06-90

Cited by

  1. An Empirical Comparison of Machine Learning Models for Classifying Emotions in Korean Twitter vol.17, pp.2, 2014, https://doi.org/10.9717/kmms.2014.17.2.232
  2. Extended pivot-based approach for bilingual lexicon extraction vol.38, pp.5, 2014, https://doi.org/10.5916/jkosme.2014.38.5.557
  3. Towards the operationalization of controversial news: a study of online news articles and reader comments during the 2017 presidential election in South Korea pp.1573-7845, 2018, https://doi.org/10.1007/s11135-018-0804-8
  4. 감정점수의 전파를 통한 한국어 감정사전 생성 vol.9, pp.2, 2010, https://doi.org/10.3745/ktsde.2020.9.2.53