DOI QR코드

DOI QR Code

An Experimental Study on Opinion Classification Using Supervised Latent Semantic Indexing(LSI)

지도적 잠재의미색인(LSI)기법을 이용한 의견 문서 자동 분류에 관한 실험적 연구

  • 이지혜 (연세대학교 문헌정보학과 대학원) ;
  • 정영미 (연세대학교 문헌정보학과)
  • Published : 2009.09.30

Abstract

The aim of this study is to apply latent semantic indexing(LSI) techniques for efficient automatic classification of opinionated documents. For the experiments, we collected 1,000 opinionated documents such as reviews and news, with 500 among them labelled as positive documents and the remaining 500 as negative. In this study, sets of content words and sentiment words were extracted using a POS tagger in order to identify the optimal feature set in opinion classification. Findings addressed that it was more effective to employ LSI techniques than using a term indexing method in sentiment classification. The best performance was achieved by a supervised LSI technique.

본 연구에서는 의견이나 감정을 담고 있는 의견 문서들의 자동 분류 성능을 향상시키기 위하여 개념색인의 하나인 잠재의미색인 기법을 사용한 분류 실험을 수행하였다. 실험을 위해 수집한 1,000개의 의견 문서는 500개씩의 긍정 문서와 부정 문서를 포함한다. 의견 문서 텍스트의 형태소 분석을 통해 명사 형태의 내용어 집합과 용언, 부사, 어기로 구성되는 의견어 집합을 생성하였다. 각기 다른 자질 집합들을 대상으로 의견 문서를 분류한 결과 용어색인에서는 의견어 집합, 잠재의미색인에서는 내용어와 의견어를 통합한 집합, 지도적 잠재의미색인에서는 내용어 집합이 가장 좋은 성능을 보였다. 전체적으로 의견 문서의 자동 분류에서 용어색인 보다는 잠재의미색인 기법의 분류 성능이 더 좋았으며, 특히 지도적 잠재의미색인 기법을 사용할 경우 최고의 분류 성능을 보였다.

Keywords

References

  1. 정영미. 2005. 정보검색연구. 서울: 구미무역 출판부
  2. 황재원, 고영중. 2008. 감정 분류를 위한 한국어 감정 자질 추출 기법과 감정 자질의 유용 성 평가. 인지과학, 19(4): 499-517
  3. Chakraborti, S., R. Lothian, N. Wiratunga, and S. Watt. 2006. 'Sprinkling: supervised Latent Semantic Indexing.' Lecture Notes in Computer Science, 3936: 510-514 https://doi.org/10.1007/11735106_53
  4. Chaovalit, P. and L. Zhou. 2005. 'Movie Review Mining: a comparison between supervised and unsupervised classification approaches.' Proc. of the 38th Annual Hawaii International Conference on System Sciences, 2005 https://doi.org/10.1109/HICSS.2005.445
  5. Cui, H., V. Mittal, and M. Datar. 2006. 'Comparative experiments on sentiment classification for online product reviews.' Proc. of the 21st National Conference on Artificial Intelligenc,. 1265- 1270
  6. Dave, K., S. Lawrence, and D. M. Pennock. 2003. 'Mining the peanut gallery: Opinion extraction and semantic classification of product reviews.' Proc. of the 12th International Conference on World Wide Web, 519-528 https://doi.org/10.1145/775152.775226
  7. Ding, C. H. Q. 1999. 'A similarity-based probability model for Latent Semantic Indexing.' Proc. of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 59-65 https://doi.org/10.1145/312624.312652
  8. Dumais, S. T. 1993. 'LSI meets TREC: A status report.' Proc. of the 1st Text REtrieval Conference(TREC-1), 137- 152
  9. Liu, Bing. 2007. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Springer
  10. Pang, Bo., Lillian Lee, and Shivakumar Vaithyanathan. 2002. 'Thumbs up? Sentiment classification using machine learning techniques.' Proc. of the ACL-02 Conference on Empirical Methods in Natural Language Processing, 79-86 https://doi.org/10.3115/1118693.1118704
  11. Pang, Bo., and L. Lee. 2004. 'A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts.' Pro. of the 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, 271-278 https://doi.org/10.3115/1218955.1218990
  12. Turney, P. 2002. 'Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews.' Proc. of the 40th annual meeting of the Association for Computational Linguistics, 417-424 https://doi.org/10.3115/1073083.1073153
  13. Wilson, T., J. Wiebe, and R. Hwa. 2004. 'Just how mad are you? Finding strong and weak opinion clauses.' Proc. of the 2004 National Conference on Association for the Advancement of Artificial Intelligence, 761-767
  14. Yang, Y. and J. O. Pedersen. 1997. 'A comparative study on feature selection in text categorization.' Proc. of the 14th International Conference on Machine Learning, 412-420
  15. Yu, H. and V. Hatzivassiloglou. 2003. 'Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences.' Proc. of the 8th Conference on Empirical Methods in Natural Language Processing, 129-136 https://doi.org/10.3115/1119355.1119372

Cited by

  1. A Study on Extracting Ideas from Documents and Webpages in the Field of Idea Mining vol.29, pp.1, 2012, https://doi.org/10.3743/KOSIM.2012.29.1.025
  2. Experimental Study for Effective Combination of Opinion Features vol.27, pp.3, 2010, https://doi.org/10.3743/KOSIM.2010.27.3.227