DOI QR코드

DOI QR Code

Text Mining-based Fake News Detection Using News And Social Media Data

뉴스와 소셜 데이터를 활용한 텍스트 기반 가짜 뉴스 탐지 방법론

  • Hyun, Yoonjin (Graduate School of Business IT, Kookmin University) ;
  • Kim, Namgyu (School of Management Information Systems, Kookmin University)
  • Received : 2018.10.22
  • Accepted : 2018.11.18
  • Published : 2018.11.30

Abstract

Recently, fake news has attracted worldwide attentions regardless of the fields. The Hyundai Research Institute estimated that the amount of fake news damage reached about 30.9 trillion won per year. The government is making efforts to develop artificial intelligence source technology to detect fake news such as holding "artificial intelligence R&D challenge" competition on the title of "searching for fake news." Fact checking services are also being provided in various private sector fields. Nevertheless, in academic fields, there are also many attempts have been conducted in detecting the fake news. Typically, there are different attempts in detecting fake news such as expert-based, collective intelligence-based, artificial intelligence-based, and semantic-based. However, the more accurate the fake news manipulation is, the more difficult it is to identify the authenticity of the news by analyzing the news itself. Furthermore, the accuracy of most fake news detection models tends to be overestimated. Therefore, in this study, we first propose a method to secure the fairness of false news detection model accuracy. Secondly, we propose a method to identify the authenticity of the news using the social data broadly generated by the reaction to the news as well as the contents of the news.

최근 가짜 뉴스가 분야를 막론하고 전 세계에서 주목을 받고 있으며, 현대경제연구원에서는 이러한 가짜 뉴스로 인한 피해 규모가 연간 약 30조 900억원에 달하는 것으로 추산하였다. 정부에서는 "가짜 뉴스 찾기"를 주제로 "인공지능 R&D 챌린지" 대회를 개최하여 가짜 뉴스를 가려낼 인공지능 원천기술 개발에 대한 첫 걸음을 내딛고 있으며, 민간 차원에서도 다양한 분야에서 팩트 체크 서비스가 제공되고 있다. 학계에서도 가짜 뉴스를 탐지하기 위한 시도가 전문가 기반, 집단지성 기반, 인공지능 기반, 시맨틱 기반 등으로 활발하게 이루어지고 있다. 하지만 이러한 시도는 조작의 정밀도가 높을수록 뉴스 자체에 대한 분석만으로 진위 여부를 식별하기가 더욱 어렵다는 한계를 경험하고 있으며, 가짜 뉴스 탐지 모델의 정확도가 과평가된 경향을 보이고 있다. 따라서 본 연구에서는 가짜 뉴스 탐지 모델 정확도의 공정성을 확보하고, 뉴스의 내용뿐만 아니라 해당 뉴스에 대한 반응으로 자연적으로 발생한 광범위한 소셜 데이터를 활용하여 뉴스의 진위 여부를 판정하는 방안을 제안하고자 한다.

Keywords

KJGRBH_2018_v23n4_19_f0001.png 이미지

Identification of News Authenticity

KJGRBH_2018_v23n4_19_f0002.png 이미지

Accuracy of Fake News Detection on News Data Sampling

KJGRBH_2018_v23n4_19_f0003.png 이미지

Examples of Fake News Detection Using Social Media Data

KJGRBH_2018_v23n4_19_f0004.png 이미지

Research Overview

KJGRBH_2018_v23n4_19_f0005.png 이미지

Example of Harsh Setting

KJGRBH_2018_v23n4_19_f0006.png 이미지

Example of A Fake News Detection Model Using Twitter Data

KJGRBH_2018_v23n4_19_f0007.png 이미지

Results of Harsh Setting(Part)

KJGRBH_2018_v23n4_19_f0008.png 이미지

Accuracy Comparison of Fake News Detection Model

KJGRBH_2018_v23n4_19_f0009.png 이미지

False Positive/Negative Analysis Results By Fake News Detection Model

KJGRBH_2018_v23n4_19_f0010.png 이미지

Correct Prediction Frequency Analysis Results

Virtual Prediction Results of A Fake News Detection Model Using News Data

KJGRBH_2018_v23n4_19_t0001.png 이미지

Virtual Prediction Results of A Fake News Detection Model Using Twitter Data

KJGRBH_2018_v23n4_19_t0002.png 이미지

Virtual Prediction Results of A Fake News Detection Model Using News and Twitter Data

KJGRBH_2018_v23n4_19_t0003.png 이미지

Prediction Results of Fake News Detection Model Using News Data(Part)

KJGRBH_2018_v23n4_19_t0004.png 이미지

Prediction Results of Fake News Detection Model Using Twitter Data(Part)

KJGRBH_2018_v23n4_19_t0005.png 이미지

Prediction Results of Fake News Detection Model Using News and Twitter Data(Part)

KJGRBH_2018_v23n4_19_t0006.png 이미지

References

  1. Albright, R., Taming Text with the SVD, SAS Institute Inc., 2006.
  2. Chen, C., Wu K., Srinivasan V., and Zhang, X., "Battling the Internet Water Army: Detection of Hidden Paid Posters," In Advances in Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM International Conference, pp. 116-120, 2013.
  3. Chen, Y., Conroy, N. J., and Rubin, V. L., "Misleading Online Content: Recognizing Clickbait as False News," In Proceedings of the 2015 ACM on Workshop on Multimodal Deception Detection, pp. 15-19, 2015.
  4. Conroy, N. J., Rubin, V. L., and Chen, Y., "Automatic Deception Detection: Methods for Finding Fake News," Proceedings of the Association for Information Science and Technology, Vol. 52, No. 1, pp. 1-4, 2016.
  5. Granik, M. and Mesyura, V., "Fake News Detection Using Naive Bayes Classifier," In Electrical and Computer Engineering (UKRCON), 2017 IEEE First Ukraine Conference, pp. 900-903, 2017.
  6. Hwang, Y. and Kwon, O., "A Study on the Conceptualization and Regulation Measures on Fake News: Focused on Self-Regulation of Internet Service Providers," Journal of Media Law, Ethics and Policy Research, Vol. 16, No. 1, pp. 53-101, 2017.
  7. Jin, Z., Cao, J., Jiang, Y. G., and Zhang, Y., "News Credibility Evaluation on Microblog with a Hierarchical Propagation Model," In Data Mining (ICDM), 2014 IEEE International Conference, pp. 230-239, 2014.
  8. Kim, D. J., "Semantic Analysis on Fake News through Portal Site and Social Network," Master Thesis, 2017.
  9. Kim, H. Y., "An Exploratory Study on Fake News Using Topic Modeling: Focused on Fake News Published in the Online Journalism," Master Thesis, 2017.
  10. Kwon, M., Jun, Y. W., and Im, H., “Controversy and Guideline Suggestion Surrounding Fake News in the Digital Media Age,” Journal of Korea Multimedia Society, Vol. 18, No. 11, pp. 1419-1426, 2015. https://doi.org/10.9717/kmms.2015.18.11.1419
  11. Kwon, S., Cha, M., Jung, K., Chen, W., and Wang, Y., "Prominent Features of Rumor Propagation in Online Social Media," In Data Mining (ICEM), 2013 IEEE 13th International Conference, pp. 1103-1108, 2013.
  12. Oh, S. U., “Current States and Limitations of Automated Fact Checking Technology,” Journal of Cybercommunication Academic Society, Vol. 34, No. 3, pp. 137-180, 2017.
  13. Park, J. H. and Kim, Y. I., "Development of a Fake News Discrimination System using SVM Classifier," Proceedings of KIIT Summer Conference, pp. 354-355, 2017.
  14. Rubin, V. L., Chen, Y., and Conroy, N. J., “Deception Detection for News: Three Types of Fakes,” Proceedings of the Association for Information Science and Technology, Vol. 52, No. 1, pp. 1-4, 2016.
  15. Lee, D., Kim, Y., and Kim, K., “Topic Based Hierarchical Network Analysis for Entrepreneur Using Text Mining,” The Journal of Society for e-Business Studies, Vol. 23, No. 3, pp. 33-49, 2018. https://doi.org/10.7838/JSEBS.2018.23.3.033
  16. Lee, S. and Kim, H. J., "Keyword Extraction from News Corpus using Modified TF-IDF," The Journal of Society for e-Business Studies, Vol. 14, No. 4, pp. 59-73, 2009.
  17. Salton, G., Wong, A., and Yang, C. S., “A Vector Space Model for Automatic Indexing,” Communications of the ACM, Vol. 18, No. 11, pp. 613-620, 1975. https://doi.org/10.1145/361219.361220
  18. Sethi, R. J., "Spotting Fake News: A Social Argumentation Framework for Scrutinizing Alternative Facts," In Web Services (ICWS), 2017 IEEE International Conference, pp. 866-869, 2017.
  19. Weiss, S. M., Indurkhya, N., and Zhang, T., Fundamentals of Predictive Text Mining, Springer, 2010.