DOI QR코드

DOI QR Code

Predicting the Direction of the Stock Index by Using a Domain-Specific Sentiment Dictionary

주가지수 방향성 예측을 위한 주제지향 감성사전 구축 방안

  • Yu, Eunji (Graduate School of Business IT, Kookmin University) ;
  • Kim, Yoosin (Graduate School of Business IT, Kookmin University) ;
  • Kim, Namgyu (Graduate School of Business IT, Kookmin University) ;
  • Jeong, Seung Ryul (Graduate School of Business IT, Kookmin University)
  • 유은지 (국민대학교 Business IT 전문대학원) ;
  • 김유신 (국민대학교 Business IT 전문대학원) ;
  • 김남규 (국민대학교 Business IT 전문대학원) ;
  • 정승렬 (국민대학교 Business IT 전문대학원)
  • Received : 2013.02.22
  • Accepted : 2013.03.11
  • Published : 2013.03.31

Abstract

Recently, the amount of unstructured data being generated through a variety of social media has been increasing rapidly, resulting in the increasing need to collect, store, search for, analyze, and visualize this data. This kind of data cannot be handled appropriately by using the traditional methodologies usually used for analyzing structured data because of its vast volume and unstructured nature. In this situation, many attempts are being made to analyze unstructured data such as text files and log files through various commercial or noncommercial analytical tools. Among the various contemporary issues dealt with in the literature of unstructured text data analysis, the concepts and techniques of opinion mining have been attracting much attention from pioneer researchers and business practitioners. Opinion mining or sentiment analysis refers to a series of processes that analyze participants' opinions, sentiments, evaluations, attitudes, and emotions about selected products, services, organizations, social issues, and so on. In other words, many attempts based on various opinion mining techniques are being made to resolve complicated issues that could not have otherwise been solved by existing traditional approaches. One of the most representative attempts using the opinion mining technique may be the recent research that proposed an intelligent model for predicting the direction of the stock index. This model works mainly on the basis of opinions extracted from an overwhelming number of economic news repots. News content published on various media is obviously a traditional example of unstructured text data. Every day, a large volume of new content is created, digitalized, and subsequently distributed to us via online or offline channels. Many studies have revealed that we make better decisions on political, economic, and social issues by analyzing news and other related information. In this sense, we expect to predict the fluctuation of stock markets partly by analyzing the relationship between economic news reports and the pattern of stock prices. So far, in the literature on opinion mining, most studies including ours have utilized a sentiment dictionary to elicit sentiment polarity or sentiment value from a large number of documents. A sentiment dictionary consists of pairs of selected words and their sentiment values. Sentiment classifiers refer to the dictionary to formulate the sentiment polarity of words, sentences in a document, and the whole document. However, most traditional approaches have common limitations in that they do not consider the flexibility of sentiment polarity, that is, the sentiment polarity or sentiment value of a word is fixed and cannot be changed in a traditional sentiment dictionary. In the real world, however, the sentiment polarity of a word can vary depending on the time, situation, and purpose of the analysis. It can also be contradictory in nature. The flexibility of sentiment polarity motivated us to conduct this study. In this paper, we have stated that sentiment polarity should be assigned, not merely on the basis of the inherent meaning of a word but on the basis of its ad hoc meaning within a particular context. To implement our idea, we presented an intelligent investment decision-support model based on opinion mining that performs the scrapping and parsing of massive volumes of economic news on the web, tags sentiment words, classifies sentiment polarity of the news, and finally predicts the direction of the next day's stock index. In addition, we applied a domain-specific sentiment dictionary instead of a general purpose one to classify each piece of news as either positive or negative. For the purpose of performance evaluation, we performed intensive experiments and investigated the prediction accuracy of our model. For the experiments to predict the direction of the stock index, we gathered and analyzed 1,072 articles about stock markets published by "M" and "E" media between July 2011 and September 2011.

최근 다양한 소셜미디어를 통해 생성되는 비정형 데이터의 양은 빠른 속도로 증가하고 있으며, 이를 저장, 가공, 분석하기 위한 도구의 개발도 이에 맞추어 활발하게 이루어지고 있다. 이러한 환경에서 다양한 분석도구를 통해 텍스트 데이터를 분석함으로써, 기존의 정형 데이터 분석을 통해 해결하지 못했던 이슈들을 해결하기 위한 많은 시도가 이루어지고 있다. 특히 트위터나 페이스북을 통해 실시간에 근접하게 생산되는 글들과 수많은 인터넷 사이트에 게시되는 다양한 주제의 글들은, 방대한 양의 텍스트 분석을 통해 많은 사람들의 의견을 추출하고 이를 통해 향후 수익 창출에 기여할 수 있는 새로운 통찰을 발굴하기 위한 움직임에 동기를 부여하고 있다. 뉴스 데이터에 대한 오피니언 마이닝을 통해 주가지수 등락 예측 모델을 제안한 최근의 연구는 이러한 시도의 대표적 예라고 할 수 있다. 우리가 여러 매체를 통해 매일 접하는 뉴스 역시 대표적인 비정형 데이터 중의 하나이다. 이러한 비정형 텍스트 데이터를 분석하는 오피니언 마이닝 또는 감성 분석은 제품, 서비스, 조직, 이슈, 그리고 이들의 여러 속성에 대한 사람들의 의견, 감성, 평가, 태도, 감정 등을 분석하는 일련의 과정을 의미한다. 이러한 오피니언 마이닝을 다루는 많은 연구는, 각 어휘별로 긍정/부정의 극성을 규정해 놓은 감성사전을 사용하며, 한 문장 또는 문서에 나타난 어휘들의 극성 분포에 따라 해당 문장 또는 문서의 극성을 산출하는 방식을 채택한다. 하지만 특정 어휘의 극성은 한 가지로 고유하게 정해져 있지 않으며, 분석의 목적에 따라 그 극성이 상이하게 나타날 수도 있다. 본 연구는 특정 어휘의 극성은 한 가지로 고유하게 정해져 있지 않으며, 분석의 목적에 따라 그 극성이 상이하게 나타날 수도 있다는 인식에서 출발한다. 동일한 어휘의 극성이 해석하는 사람의 입장에 따라 또는 분석 목적에 따라 서로 상이하게 해석되는 현상은 지금까지 다루어지지 않은 어려운 이슈로 알려져 있다. 구체적으로는 주가지수의 상승이라는 한정된 주제에 대해 각 관련 어휘가 갖는 극성을 판별하여 주가지수 상승 예측을 위한 감성사전을 구축하고, 이를 기반으로 한 뉴스 분석을 통해 주가지수의 상승을 예측한 결과를 보이고자 한다.

Keywords

References

  1. Ahn, H., S. P. Jeon, and J. B. Chay, "The effects of the News Related to the North-South Korean Relationship on the Korean Stock Markets", Korea Institute of Finance : Analysis of Korea Finance, Vol.16, No.2(2010), 199-231.
  2. Ahn, S. and S. Cho, "Stock Prediction Using News Text Mining and Time Series Analysis", Korea Computer Congress, Vol.27, No.1(2010), 364-369.
  3. Chen, H. and D. Zimbra, "AI and Opinion Mining", IEEE Intelligent Systems, Vol.25, No.3(2010), 74-80.
  4. Chung, F. L., "Chak-man Ng : Discovering the Correlation between Stock Time Series and Financial News",Web Intelligence, Vol.1(2008), 880-883.
  5. Fu, T. C., K. K. Lee, D. C. M. Sze, and F. L. Chung, "Chak-man Ng : Discovering the Correlation between Stock Time Series and Financial News", Web Intelligence, Vol.1(2008), 9-12.
  6. Gartner, "Gartner identifies the top10 strategic technologies for 2011", 2010.
  7. Gartner, "2012 Hype Cycle for Emerging Technologies", 2012.
  8. Jung, Y., Y. Choi, and S. H. Myeang, "A study on Negation Handling and Term Weighting Schemes and Their Effects on Mood-based Text Classification", Cognitive Science, Vol. 19, No.4(2008), 477-497.
  9. Kim, J., S. Lee, and H. Yong, "Automatic Classification Scheme of Opinions Written in Korea", Journal of Korean Institute of Information Scientists and Engineers : Database, Vol.38, No.6(2011), 423-428.
  10. Kim, M., J. Kim, M. Cha, and S. H. Chae, "An Emotion Scanning System on Text Documents", Cognitive Science, Vol.12, No.4(2009), 433-442.
  11. Kim, S. W. and H. Ahn, "Development of an Intellient Trading System Using Support Vector Machines and Genetic Algorithms", Journal of Intelligence and Information Systems, Vol.16, No.1(2010), 71-92.
  12. Kim, Y., N. Kim, S. R. Jeong, "Stock-index Invest Model Using News Big Data Opinion Mining", Journal of Intelligence and Information Systems, Vol.18, No.2(2012), 143-156.
  13. Lee, G., "Economic News and Stock Market Correlation : A Study of the UK Market", Conference on Terminology and Knowledge Engineering, 2002.
  14. Liu, B., "Opinion Mining", Department of Computer Science University of Illinois at Chicago, 2010.
  15. Mckinsey, and Company, "Big Data : The next Frontier for Innovation, Competition, and Productivity", 2011.
  16. Mitchell, M. L. and J. H. Mulherin, "The Impact of Public Information on the Stock Market", The Journal of Finance, Vol.49, No.3(1994), 923-950. https://doi.org/10.1111/j.1540-6261.1994.tb00083.x
  17. Mittermayer, M. A. and G. Knolmayer, "Text Mining Systems for Market Response to News : A Survey", The Institute of Information Systems Working Papers, 2006.
  18. Paik, W., M. H. Kyoung, K. S. Min, H. R. Oh, C. Lim, and M. S. Shin, "Multi-stage News Classification System for Predicting Stock Price Changes", Journal of the Korean Society for Information Management, Vol.24, No.2(2007), 123-141. https://doi.org/10.3743/KOSIM.2007.24.2.123
  19. Park, J. and I. Han, "Predicting Korea Composite Stock Index(KOSPI) Using Artificial Neural Network", Journal of Intelligence and Information Systems, Vol.1, No.2(1995), 359-371.
  20. Schumaker, R. P. and H. Chen, "Textual Analysis of Stock Market Prediction Using Breaking Financial News : The AZFinText System", ACM Transactions on Information Systems, Vol.27, No.2(2009).
  21. Sehgal, V. and C. Song, "SOPS : Stock Prediction using Web Sentiment Department of Computer Science University of Maryland College Park, Maryland, USA", Seventh IEEE International Conference on Data Mining : Workshops, (2007), 21-26.
  22. Song, C., "News and Financial Prices", International Economic Journal, Vol.8, No.3(2002), 1-34.
  23. Song, J. and S. Lee, "Automatic Construction of Positive/Negative Feature-Predicate Dictionary for Polarity Classification of Product Reviews", Journal of Korean Institute of Information Scientists and Engineers: Software and Application, Vol.38, No.3(2011), 115-177.
  24. Yune, H., H. Kim, and J. Y. Jang, "An Efficient Search Method of Product Review using Opinion Mining Techniques", Journal of Korean Institute of Information Scientists and Engineers : Computing Practices and Letters, Vol.16, No.2(2010), 135-259.

Cited by

  1. A Malicious Comments Detection Technique on the Internet using Sentiment Analysis and SVM vol.20, pp.2, 2016, https://doi.org/10.6109/jkiice.2016.20.2.260
  2. A Study on the Effect of Using Sentiment Lexicon in Opinion Classification vol.20, pp.1, 2014, https://doi.org/10.13088/jiis.2014.20.1.133
  3. Intelligent VOC Analyzing System Using Opinion Mining vol.19, pp.3, 2013, https://doi.org/10.13088/jiis.2013.19.3.113
  4. Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis vol.22, pp.1, 2016, https://doi.org/10.13088/jiis.2016.22.1.01
  5. The Relationship between Internet Search Volumes and Stock Price Changes: An Empirical Study on KOSDAQ Market vol.22, pp.2, 2016, https://doi.org/10.13088/jiis.2016.22.2.081
  6. Study on the social issue sentiment classification using text mining vol.26, pp.5, 2015, https://doi.org/10.7465/jkdi.2015.26.5.1167
  7. Improving Performance of Recommendation Systems Using Topic Modeling vol.21, pp.3, 2015, https://doi.org/10.13088/jiis.2015.21.3.101
  8. Development of Sentiment Analysis Model for the hot topic detection of online stock forums vol.22, pp.1, 2016, https://doi.org/10.13088/jiis.2016.22.1.187
  9. The Brand Personality Effect: Communicating Brand Personality on Twitter and its Influence on Online Community Engagement vol.20, pp.1, 2014, https://doi.org/10.13088/jiis.2014.20.1.067
  10. Bankruptcy Prediction Modeling Using Qualitative Information Based on Big Data Analytics vol.22, pp.2, 2016, https://doi.org/10.13088/jiis.2016.22.2.033
  11. Visualizing the Results of Opinion Mining from Social Media Contents: Case Study of a Noodle Company vol.20, pp.4, 2014, https://doi.org/10.13088/jiis.2014.20.4.89
  12. Correlation between Car Accident and Car Color for Intelligent Service vol.19, pp.4, 2013, https://doi.org/10.13088/jiis.2013.19.4.011
  13. A Comparative Study between Stock Price Prediction Models Using Sentiment Analysis and Machine Learning Based on SNS and News Articles vol.13, pp.3, 2014, https://doi.org/10.9716/KITS.2014.13.3.221
  14. A Morphological Analysis Method of Predicting Place-Event Performance by Online News Titles vol.21, pp.1, 2016, https://doi.org/10.7838/jsebs.2016.21.1.015
  15. Reliability Analysis of VOC Data for Opinion Mining vol.22, pp.4, 2016, https://doi.org/10.13088/jiis.2016.22.4.217
  16. Sentiment analysis on movie review through building modified sentiment dictionary by movie genre vol.22, pp.2, 2016, https://doi.org/10.13088/jiis.2016.22.2.097
  17. Investigating the Impact of Corporate Social Responsibility on Firm's Short- and Long-Term Performance with Online Text Analytics vol.22, pp.2, 2016, https://doi.org/10.13088/jiis.2016.22.2.013
  18. Influence analysis of Internet buzz to corporate performance : Individual stock price prediction using sentiment analysis of online news vol.21, pp.4, 2015, https://doi.org/10.13088/jiis.2015.21.4.037
  19. Movie Rating Inference by Construction of Movie Sentiment Sentence using Movie comments and ratings vol.16, pp.2, 2015, https://doi.org/10.7472/jksii.2015.16.2.41
  20. Study on prediction for a film success using text mining vol.26, pp.6, 2015, https://doi.org/10.7465/jkdi.2015.26.6.1259
  21. A Method for Evaluating News Value based on Supply and Demand of Information Using Text Analysis vol.22, pp.4, 2016, https://doi.org/10.13088/jiis.2016.22.4.045
  22. Big Data & Its Subjects: Digital Citizens are Constructed vol.105, pp.None, 2013, https://doi.org/10.18207/criso.2015..105.94
  23. SW 교육 뉴스데이터의 감성분석 vol.21, pp.1, 2013, https://doi.org/10.14352/jkaie.2017.21.1.89
  24. 감성분석과 Word2vec을 이용한 비정형 품질 데이터 분석 vol.45, pp.1, 2013, https://doi.org/10.7469/jksqm.2017.45.1.117
  25. 카테고리 중립 단어 활용을 통한 주가 예측 방안: 텍스트 마이닝 활용 vol.23, pp.2, 2013, https://doi.org/10.13088/jiis.2017.23.2.123
  26. 주가지수 방향성 예측을 위한 도메인 맞춤형 감성사전 구축방안 vol.18, pp.3, 2013, https://doi.org/10.9728/dcs.2017.18.3.585
  27. 소셜빅데이터를 이용한 온라인 소비자감성지수(e-CCSI) 개발 vol.18, pp.4, 2013, https://doi.org/10.7472/jksii.2017.18.4.121
  28. SNS 감성분석을 이용한 정보 추출 방법론에 관한 연구 vol.16, pp.6, 2013, https://doi.org/10.12815/kits.2017.16.6.141
  29. A Study on Opinion Mining Using Statistical Package R: Focusing on Pizza Franchise Companies vol.24, pp.9, 2013, https://doi.org/10.20878/cshr.2018.24.9.004
  30. 평점과 리뷰 텍스트 감성분석을 결합한 추천시스템 향상 방안 연구 vol.25, pp.1, 2013, https://doi.org/10.13088/jiis.2019.25.1.219
  31. 소셜미디어를 통한 우울 경향 이용자 담론 주제 분석 vol.36, pp.4, 2019, https://doi.org/10.3743/kosim.2019.36.4.207
  32. 증권형 크라우드펀딩 투자설명서 형태소분석을 통한 투자자 보호방안에 관한 연구 vol.18, pp.5, 2013, https://doi.org/10.9716/kits.2019.18.5.165
  33. 증권형 크라우드펀딩 투자설명서 형태소분석을 통한 투자자 보호방안에 관한 연구 vol.18, pp.5, 2013, https://doi.org/10.9716/kits.2019.18.5.165
  34. Sentiment Digitization Modeling for Recommendation System vol.12, pp.12, 2013, https://doi.org/10.3390/su12125191
  35. Rating Prediction by Evaluation Item through Sentiment Analysis of Restaurant Review vol.25, pp.6, 2020, https://doi.org/10.9708/jksci.2020.25.06.081
  36. 뉴스 데이터를 활용한 텍스트 감성분석에 따른 지역 산업생태계 위기 예측 - 광주 지역 자동차 산업을 중심으로 - vol.20, pp.8, 2013, https://doi.org/10.5392/jkca.2020.20.08.001
  37. 온라인 뉴스와 거시경제 지표, 금융 지표, 기술적 지표, 관심도 지표를 이용한 코스닥 상장 기업의 기계학습 기반 주가 변동 예측 vol.24, pp.3, 2013, https://doi.org/10.9717/kmms.2020.24.3.448