DOI QR코드

DOI QR Code

Prediction improvement of election polls by unstructured data analysis

비정형 데이터 분석을 통한 선거 여론조사 예측력 개선 방안 연구

  • Park, Sunbin (Department of Business Statistics, Hannam University) ;
  • Kim, Myung Joon (Department of Business Statistics, Hannam University)
  • 박선빈 (한남대학교 비즈니스통계학과) ;
  • 김명준 (한남대학교 비즈니스통계학과)
  • Received : 2018.08.08
  • Accepted : 2018.08.27
  • Published : 2018.10.31

Abstract

Social network services (SNS) have become the most common tool for the communication of public and private opinions as well as public issues; consequently, one may form or drive public opinions to advocate by spreading positive content using SNS. Controversy for survey data based opinion poll accuracy continues in relation to response rate or sampling methodology. This study suggests complementary measures that additionally consider the sentiment analysis results of unstructured data on a social network by data crawling and sentiment dictionary adjustment process. The suggested method shows the improvement of prediction accuracy by decreasing error rates.

소셜 네트워크 서비스(social network service; SNS)는 개개인의 의견을 공유하거나 소통하는 일반적인 도구로 사용되고 있으며, 특히 정치적인 이슈의 전파 과정에서 타인과의 공유를 통하여 자신이 지지하는 후보에 대한 긍정적인 홍보 등을 통해 여론을 형성 또는 확장한다. 기존의 여론 조사 결과는 응답률, 표본 수집의 방식 등과 관련하여 예측의 정확성에 대한 끊임없는 논란이 되어왔다. 본 논문은 이러한 소셜 네트워크 서비스 상에 존재하는 수많은 비정형 데이터의 감성 분석을 통하여 여론조사의 예측력을 개선, 보완하는 방안을 제시하고자 한다. 제시하고자 하는 연구 내용은 비정형 데이터 크롤링 및 기존에 사용되던 감성 사전에 대한 추가적인 보정 과정을 포함하고 있으며, 이를 통하여 본 논문에서 제안하는 방식은 오차의 감소를 통하여 예측력을 개선하는 결과를 나타냈다.

Keywords

References

  1. Bae, J. H., Son, J. E., and Song, M. S. (2013). Analysis of twitter for 2012 South Korea presidential election by text mining techniques, Journal of Intelligence and Information Systems, 19, 141-156.
  2. Chang, J. Y. (2009). A sentiment analysis algorithm for automatic product reviews classification in on-line shopping mall, The Journal of Society for e-Business Studies, 14, 19-33.
  3. Choi, D. S., Mun, G. J., Kim, Y. M., and Noh, B. N. (2011a). An analysis of large-scale security log using MapReduce, Korean Institute of Information Technology, 9, 125-132.
  4. Choi, H., Tak, Y., and Hwang, E. (2011b). Music recommendation scheme based on twitter analysis. In Proceedings of The 38th KIISE Fall Conference, 38, 279-282.
  5. Choi, M. and Yang, S. (2009). Internet social media and journalism report, Korea Press Foundation, 2009-1
  6. Hyun, K. (2010). Election polling, what is problem?, Kwanhun Journal, 116, 9-17.
  7. Jho, H. and Kim, J. (2012). Political communication and civic participation through blogs and twitter, Journal of Cybercommunication Academic Society, 29, 95-130.
  8. Kim, J. H. and Jung, H. (2017). Causal study on the effect of survey methods in the 19th presidential election telephone survey, The Korean Journal of Applied Statistics, 30, 943-955.
  9. Kim, S. Y. and Huh, M. H. (2009). Systematic bias of telephone surveys: meta analysis of 2007 presidential election polls, The Korean Journal of Applied Statistics, 22, 375-385. https://doi.org/10.5351/KJAS.2009.22.2.375
  10. Kim, S. Y. and Kwon, S. P. (2009). The effect of survey refusal and noncontact on nonresponse error: for economically active population survey, The Korean Journal of Applied Statistics, 22, 667-676. https://doi.org/10.5351/KJAS.2009.22.3.667
  11. Kim, S. and Hwang, B. (2014). Propensity analysis of political attitude of twitter users by extracting sentiment from timeline, Journal of Korea Multimedia Society, 17, 43-51. https://doi.org/10.9717/kmms.2014.17.1.043
  12. Kim, Y. and Jeong, S. R. (2013). Intelligent VOC analyzing system using opinion mining, Journal of Intelligence and Information Systems, 19, 113-125. https://doi.org/10.13088/jiis.2013.19.3.113
  13. Kim, W., Lee, J., Park, J., and Choi, J. (2014). A technique of the approval rating analysis for political party using opinion mining, The Journal of Korean Institute of Information Technology, 12, 133-141.
  14. Kramer, A. D. I., Guillory, J. E., and Hancock, J. T. (2014). Experimental evidence of massive-scale emotional contagion through social networks, PNAS, 111, 8788-8790. https://doi.org/10.1073/pnas.1320040111
  15. Lee, J. H., Kim, J., and Lee, K. J. (2006). Missing imputation methods using the spatial variable in sample survey, The Korean Journal of Applied Statistics, 19, 57-67. https://doi.org/10.5351/KJAS.2006.19.1.057
  16. Lee, S. and Lee, D. (2015). Real time predictive analytic system design and implementation using Bigdata-log, Journal of The Korea Institute of Information Security and Cryptology, 25, 1399-1410. https://doi.org/10.13089/JKIISC.2015.25.6.1399
  17. Park, C., Lim, S., Cha, S., Lee, I., and Kim, J. (2014). Formation of weak ties in social media, The Korea Contents Association, 14, 97-109.
  18. Park, J., Lee, H., Kang, K., and Kim, B. (2018). Real-time pavement damage detection based on video analysis and notification service, KIISE Transactions on Computing Practices, 24, 59-66. https://doi.org/10.5626/KTCP.2018.24.2.59
  19. Park, S. J., Jung, W. H., Han, J. H., and Shin, S. J. (2004). Analysis of affective words on photographic images and the effects of color on the images, Korean Journal of the Science of Emotion and Sensibility, 7, 41-49.
  20. Wegrzyn-Wolska, K. and Bougueroua, L. (2012). Tweets mining for French presidential election, Computational Aspects of Social Networks, 2012 Fourth International Conference, 138-143.
  21. Williams, C. and Gulati, G. (2008). What is a social network worth? Facebook and vote share in the 2008 presidential primaries, American Political Science Association, Annual Meeting, 1-17.