An Experimental Study on Opinion Classification Using Supervised Latent Semantic Indexing(LSI)

Lee, Ji-Hye;Chung, Young-Mee;

doi:10.3743/KOSIM.2009.26.3.451

Journal of the Korean Society for information Management (정보관리학회지)

Volume 26 Issue 3
/
Pages.451-462
/
2009
/
1013-0799(pISSN)
/
2586-2073(eISSN)

Korean Society for Information Management (한국정보관리학회)

DOI QR Code

An Experimental Study on Opinion Classification Using Supervised Latent Semantic Indexing(LSI)

지도적 잠재의미색인(LSI)기법을 이용한 의견 문서 자동 분류에 관한 실험적 연구

이지혜 (연세대학교 문헌정보학과 대학원) ;
정영미 (연세대학교 문헌정보학과)

Published : 2009.09.30

https://doi.org/10.3743/KOSIM.2009.26.3.451 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

The aim of this study is to apply latent semantic indexing(LSI) techniques for efficient automatic classification of opinionated documents. For the experiments, we collected 1,000 opinionated documents such as reviews and news, with 500 among them labelled as positive documents and the remaining 500 as negative. In this study, sets of content words and sentiment words were extracted using a POS tagger in order to identify the optimal feature set in opinion classification. Findings addressed that it was more effective to employ LSI techniques than using a term indexing method in sentiment classification. The best performance was achieved by a supervised LSI technique.

본 연구에서는 의견이나 감정을 담고 있는 의견 문서들의 자동 분류 성능을 향상시키기 위하여 개념색인의 하나인 잠재의미색인 기법을 사용한 분류 실험을 수행하였다. 실험을 위해 수집한 1,000개의 의견 문서는 500개씩의 긍정 문서와 부정 문서를 포함한다. 의견 문서 텍스트의 형태소 분석을 통해 명사 형태의 내용어 집합과 용언, 부사, 어기로 구성되는 의견어 집합을 생성하였다. 각기 다른 자질 집합들을 대상으로 의견 문서를 분류한 결과 용어색인에서는 의견어 집합, 잠재의미색인에서는 내용어와 의견어를 통합한 집합, 지도적 잠재의미색인에서는 내용어 집합이 가장 좋은 성능을 보였다. 전체적으로 의견 문서의 자동 분류에서 용어색인 보다는 잠재의미색인 기법의 분류 성능이 더 좋았으며, 특히 지도적 잠재의미색인 기법을 사용할 경우 최고의 분류 성능을 보였다.

Keywords

References

정영미. 2005. 정보검색연구. 서울: 구미무역 출판부
황재원, 고영중. 2008. 감정 분류를 위한 한국어 감정 자질 추출 기법과 감정 자질의 유용 성 평가. 인지과학, 19(4): 499-517
Chakraborti, S., R. Lothian, N. Wiratunga, and S. Watt. 2006. 'Sprinkling: supervised Latent Semantic Indexing.' Lecture Notes in Computer Science, 3936: 510-514 https://doi.org/10.1007/11735106_53
Chaovalit, P. and L. Zhou. 2005. 'Movie Review Mining: a comparison between supervised and unsupervised classification approaches.' Proc. of the 38th Annual Hawaii International Conference on System Sciences, 2005 https://doi.org/10.1109/HICSS.2005.445
Cui, H., V. Mittal, and M. Datar. 2006. 'Comparative experiments on sentiment classification for online product reviews.' Proc. of the 21st National Conference on Artificial Intelligenc,. 1265- 1270
Dave, K., S. Lawrence, and D. M. Pennock. 2003. 'Mining the peanut gallery: Opinion extraction and semantic classification of product reviews.' Proc. of the 12th International Conference on World Wide Web, 519-528 https://doi.org/10.1145/775152.775226
Ding, C. H. Q. 1999. 'A similarity-based probability model for Latent Semantic Indexing.' Proc. of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 59-65 https://doi.org/10.1145/312624.312652
Dumais, S. T. 1993. 'LSI meets TREC: A status report.' Proc. of the 1st Text REtrieval Conference(TREC-1), 137- 152
Liu, Bing. 2007. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Springer
Pang, Bo., Lillian Lee, and Shivakumar Vaithyanathan. 2002. 'Thumbs up? Sentiment classification using machine learning techniques.' Proc. of the ACL-02 Conference on Empirical Methods in Natural Language Processing, 79-86 https://doi.org/10.3115/1118693.1118704
Pang, Bo., and L. Lee. 2004. 'A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts.' Pro. of the 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, 271-278 https://doi.org/10.3115/1218955.1218990
Turney, P. 2002. 'Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews.' Proc. of the 40th annual meeting of the Association for Computational Linguistics, 417-424 https://doi.org/10.3115/1073083.1073153
Wilson, T., J. Wiebe, and R. Hwa. 2004. 'Just how mad are you? Finding strong and weak opinion clauses.' Proc. of the 2004 National Conference on Association for the Advancement of Artificial Intelligence, 761-767
Yang, Y. and J. O. Pedersen. 1997. 'A comparative study on feature selection in text categorization.' Proc. of the 14th International Conference on Machine Learning, 412-420
Yu, H. and V. Hatzivassiloglou. 2003. 'Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences.' Proc. of the 8th Conference on Empirical Methods in Natural Language Processing, 129-136 https://doi.org/10.3115/1119355.1119372

Cited by

A Study on Extracting Ideas from Documents and Webpages in the Field of Idea Mining vol.29, pp.1, 2012, https://doi.org/10.3743/KOSIM.2012.29.1.025
Experimental Study for Effective Combination of Opinion Features vol.27, pp.3, 2010, https://doi.org/10.3743/KOSIM.2010.27.3.227

Journal of the Korean Society for information Management (정보관리학회지)

An Experimental Study on Opinion Classification Using Supervised Latent Semantic Indexing(LSI)

지도적 잠재의미색인(LSI)기법을 이용한 의견 문서 자동 분류에 관한 실험적 연구

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)