DOI QR코드

DOI QR Code

Outlier Detection Techniques for Biased Opinion Discovery

편향된 의견 문서 검출을 위한 이상치 탐지 기법

  • Yeon, Jongheum (School of Computer Science and Engineering, Seoul National University) ;
  • Shim, Junho (Department of Computer Science, Sookmyung Women's University) ;
  • Lee, Sanggoo (School of Computer Science and Engineering, Seoul National University)
  • Received : 2013.10.29
  • Accepted : 2013.11.22
  • Published : 2013.11.30

Abstract

Users in social media post various types of opinions such as product reviews and movie reviews. It is a common trend that customers get assistance from the opinions in making their decisions. However, as opinion usage grows, distorted feedbacks also have increased. For example, exaggerated positive opinions are posted for promoting target products. So are negative opinions which are far from common evaluations. Finding these biased opinions becomes important to keep social media reliable. Techniques of opinion mining (or sentiment analysis) have been developed to determine sentiment polarity of opinionated documents. These techniques can be utilized for finding the biased opinions. However, the previous techniques have some drawback. They categorize the text into only positive and negative, and they also need a large amount of training data to build the classifier. In this paper, we propose methods for discovering the biased opinions which are skewed from the overall common opinions. The methods are based on angle based outlier detection and personalized PageRank, which can be applied without training data. We analyze the performance of the proposed techniques by presenting experimental results on a movie review dataset.

소셜 미디어에서는 상품평, 영화평 등의 다양한 종류의 의견이 표현되고 있으며, 사용자들이 물품 구매 등에 있어 이러한 의견을 참고로 하여 결정을 내리는 것은 일반적이 되었다. 하지만 의견 정보의 활용도가 높아질수록 이를 부적절하게 왜곡하는 사례 또한 증가하고 있다. 예를 들어, 홍보를 목적으로 과도하게 긍정적인 의견이 포함된 리뷰를 작성하거나, 반대로 일반적인 평가에서 벗어나 과도하게 부정적인 의견을 게시하는 경우 등이다. 편향된 의견은 소셜 미디어의 신뢰성과 연결 되기 때문에 이를 검출하는 것은 점차 중요한 문제로 대두되고 있다. 기존의 오피니언 마이닝 혹은 감성 분석은 문서를 분석하여 그 문서가 가지고 있는 의견의 성향을 판단하는 기법이다. 하지만 기존의 연구는 의견을 단순히 긍정/부정으로만 분류하는 방향으로 연구가 이루어져 왔으며, 특히 사전에 의견 성향에 따라 분류된 충분한 양의 학습 데이터가 필요하다는 단점이 있다. 본 논문에서는 학습데이터가 없는 경우에, 전체 문서의 의견 성향 분포에서 벗어난 의견 문서를 검출하는 기법을 제안한다. 여기에는 각도기반 이상치 탐지와, 개인화된 페이지랭크 방법을 활용한다. 또한 영화 리뷰 문서를 대상으로 실험을 수행하여 제안한 방법들의 성능을 분석하였다.

Keywords

References

  1. Scaffidi, C., Bierhoff, K., Chang, E., Felker, M., Ng, H. and Jin, C., "Red Opal : Product‐ Feature Scoring from Reviews," In Proceedings of the 8th ACM conference on Electronic Commerce, 2007.
  2. Jindal, N. and Liu, B., "Opinion Spam and Analysis," In Proceedings of the international conference on Web search and web data mining, 2008.
  3. Castillo, C. and Davison, B. D., "Adversarial Web Search," Foundations and Trends in Information Retrieval, Vol. 4, No. 5, 2010.
  4. Liu, B., "Web Data Mining : Exploring Hyperlinks, Contents, and Usage Data," Springer, 2011.
  5. Pang, B., Lee, L. and Vaithyanathan, S., "Thumbs up? Sentiment Classification using Machine Learning Techniques," In Proceedings of the ACL 02 conference on Empirical methods in natural language processing, Vol. 10, 2002.
  6. Ding, X., Liu, B., and Yu, P. S., "A holistic lexicon based approach to opinion mining," In Proceedings of the international conference on Web search and web data mining, 2008.
  7. Hu, M. and Liu, B., "Mining and summarizing customer reviews," In Proceedings of the 10th ACM SIGKDD international conference on Knowledge Discovery and Data mining, 2004.
  8. Liu, B., Hu, M. and Cheng, J., "Opinion observer : analyzing and comparing opinions on the Web," In Proceedings of the 14th international on World Wide Web, 2005.
  9. Scaffidi, C., Bierhoff, K., Chang, E., M. Felker, Ng, H. and Jin, C., "Red Opal : Product Feature Scoring from Reviews," In Proceedings of the 8th ACM conference on Electronic Commerce, 2007.
  10. Jin, W., Ho, H. and Srihari, R., "Opinion- Miner : a novel machine learning system for web opinion mining and extraction," In Proceedings of the 15th ACM SIGKDD international conference on Knowledge Discovery and Data mining, 2009.
  11. Esuli, A. and Sebastiani, F., "Determining Term Subjectivity and Term Orientation for Opinion Mining," In Proceedings of 11th conference of the European chapter of the Association for Computational Linguistics, 2006.
  12. Denecke, K., "Using SentiWordNet for Multilingual Sentiment Analysis," In Proceedings of the International Conference on Data Engineering : ICDE, Workshop on Data Engineering for Blogs, Social Media, and Web 2.0, 2008.
  13. Lim, E., Nguyen, V., Jindal, N., Liu, B., and Lauw, H., "Detecting product review spammers using rating behaviors," In Proceedings of the 19th ACM international conference on Information and knowledge management, 2010.
  14. Mukherjee, A., Liu, B. and Glance, N., "Spotting fake reviewer groups in consumer reviews," In Proceedings of the 21st international conference on World Wide Web, 2012.
  15. Yeom, J., Lee, D. Shim, J., Lee, S. g., "Product Review Data and Sentiment Analytical Processing Modeling," The Journal of Society for e-Business Studies, Vol. 16, No. 4, 2011.

Cited by

  1. The Study of Developing Korean SentiWordNet for Big Data Analytics : Focusing on Anger Emotion vol.19, pp.4, 2014, https://doi.org/10.7838/jsebs.2014.19.4.001
  2. A Semantic Text Model with Wikipedia-based Concept Space vol.19, pp.3, 2014, https://doi.org/10.7838/jsebs.2014.19.3.107
  3. Impact Parameter Analysis of Subspace Clustering vol.11, pp.9, 2015, https://doi.org/10.1155/2015/398452
  4. Toll Fraud Detection of VoIP Service Networks in Ubiquitous Computing Environments vol.11, pp.9, 2015, https://doi.org/10.1155/2015/276408
  5. 이상탐지 활용 전자집단민원 추정 방법론에 관한 탐색적 연구: 창원시 시민의 소리 사례를 중심으로 vol.26, pp.4, 2019, https://doi.org/10.22693/niaip.2019.26.4.085