DOI QR코드

DOI QR Code

Bankruptcy Prediction Modeling Using Qualitative Information Based on Big Data Analytics

빅데이터 기반의 정성 정보를 활용한 부도 예측 모형 구축

  • 조남옥 (이화여자대학교 경영대학) ;
  • 신경식 (이화여자대학교 경영대학)
  • Received : 2016.05.03
  • Accepted : 2016.06.13
  • Published : 2016.06.30

Abstract

Many researchers have focused on developing bankruptcy prediction models using modeling techniques, such as statistical methods including multiple discriminant analysis (MDA) and logit analysis or artificial intelligence techniques containing artificial neural networks (ANN), decision trees, and support vector machines (SVM), to secure enhanced performance. Most of the bankruptcy prediction models in academic studies have used financial ratios as main input variables. The bankruptcy of firms is associated with firm's financial states and the external economic situation. However, the inclusion of qualitative information, such as the economic atmosphere, has not been actively discussed despite the fact that exploiting only financial ratios has some drawbacks. Accounting information, such as financial ratios, is based on past data, and it is usually determined one year before bankruptcy. Thus, a time lag exists between the point of closing financial statements and the point of credit evaluation. In addition, financial ratios do not contain environmental factors, such as external economic situations. Therefore, using only financial ratios may be insufficient in constructing a bankruptcy prediction model, because they essentially reflect past corporate internal accounting information while neglecting recent information. Thus, qualitative information must be added to the conventional bankruptcy prediction model to supplement accounting information. Due to the lack of an analytic mechanism for obtaining and processing qualitative information from various information sources, previous studies have only used qualitative information. However, recently, big data analytics, such as text mining techniques, have been drawing much attention in academia and industry, with an increasing amount of unstructured text data available on the web. A few previous studies have sought to adopt big data analytics in business prediction modeling. Nevertheless, the use of qualitative information on the web for business prediction modeling is still deemed to be in the primary stage, restricted to limited applications, such as stock prediction and movie revenue prediction applications. Thus, it is necessary to apply big data analytics techniques, such as text mining, to various business prediction problems, including credit risk evaluation. Analytic methods are required for processing qualitative information represented in unstructured text form due to the complexity of managing and processing unstructured text data. This study proposes a bankruptcy prediction model for Korean small- and medium-sized construction firms using both quantitative information, such as financial ratios, and qualitative information acquired from economic news articles. The performance of the proposed method depends on how well information types are transformed from qualitative into quantitative information that is suitable for incorporating into the bankruptcy prediction model. We employ big data analytics techniques, especially text mining, as a mechanism for processing qualitative information. The sentiment index is provided at the industry level by extracting from a large amount of text data to quantify the external economic atmosphere represented in the media. The proposed method involves keyword-based sentiment analysis using a domain-specific sentiment lexicon to extract sentiment from economic news articles. The generated sentiment lexicon is designed to represent sentiment for the construction business by considering the relationship between the occurring term and the actual situation with respect to the economic condition of the industry rather than the inherent semantics of the term. The experimental results proved that incorporating qualitative information based on big data analytics into the traditional bankruptcy prediction model based on accounting information is effective for enhancing the predictive performance. The sentiment variable extracted from economic news articles had an impact on corporate bankruptcy. In particular, a negative sentiment variable improved the accuracy of corporate bankruptcy prediction because the corporate bankruptcy of construction firms is sensitive to poor economic conditions. The bankruptcy prediction model using qualitative information based on big data analytics contributes to the field, in that it reflects not only relatively recent information but also environmental factors, such as external economic conditions.

대부분의 부도 예측에 관한 연구는 재무 변수를 중심으로 통계적 방법 또는 인공지능 기법을 적용하여 부도 예측 모형을 구축하였다. 그러나 재무비율과 같은 회계 정보를 이용한 부도 예측 모형은 재무 제표 결산 시점과 신용평가 시점 간 시차를 고려하지 않을 뿐만 아니라 해당 산업의 경제적 상황과 같은 외부 환경적인 요소를 반영하기 어렵다는 한계점이 존재하였다. 기업의 부도 여부를 예측하기 위해 정량 정보인 재무 변수만을 이용하는 것에 한계가 있음에도 불구하고 정성 정보를 부도 예측 모형에 반영한 연구는 아직 미흡한 실정이다. 본 연구에서는 재무 변수를 이용하는 기존 부도 예측 모형의 성과를 개선하기 위해 빅데이터 기반의 정성 정보를 추가적인 입력 변수로 활용하는 부도 예측 모형을 제안하였다. 제안 모형의 성과 향상은 정성 정보를 예측 모형에 통합시키기에 적합한 형태로 정보의 유형을 변환시킬 수 있는가에 따라 달려있다. 이에 본 연구에서는 정성 정보 처리를 위한 방법으로 빅데이터 분석 기법 중 하나인 텍스트 마이닝(Text Mining)을 활용하였다. 해당 산업과 관련된 경제 뉴스 데이터로부터 경제 상황에 대한 감성 정보를 추출하기 위해 도메인 중심의 감성 어휘 사전을 구축하고, 구축된 어휘 사전을 기반으로 감성 분석(Sentiment Analysis)을 수행하였다. 형태소 분석 등을 포함한 텍스트 전처리 과정을 거쳐 감성 어휘를 추출하고, 각 어휘에 대한 극성 및 감성 점수를 부여하였다. 분석 결과, 전통적 부도 예측 모형에 경제 뉴스 데이터에서 도출한 정성 정보를 반영하는 것은 모형의 성과를 개선하는 것으로 나타났다. 특히, 경제 상황에 대한 부정적 감정이 기업의 부도 여부를 예측하는 데 더욱 효과적임을 알 수 있었다.

Keywords

References

  1. Altman, E. I., "Financial ratios, discriminant analysis and the prediction of corporate bankruptcy," The journal of finance, Vol.23, No.4(1968), 589-609. https://doi.org/10.1111/j.1540-6261.1968.tb00843.x
  2. Altman, E. I., Sabato, G., and N. Wilson, "The value of non-financial information in small and medium-sized enterprise risk management," Journal of Credit Risk, Vol.2, No.6(2010), 95-127.
  3. Asur, S. and B. A. Huberman, "Predicting the future with social media," Proceedings of 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), Vol.1, (2010), 492-499.
  4. Boiy, E. and M. F. Moens, "A machine learning approach to sentiment analysis in multilingual Web texts," Information retrieval, Vol.12, No.5(2009), 526-558. https://doi.org/10.1007/s10791-008-9070-z
  5. Church, K. W. and P. Hanks, "Word association norms, mutual information, and lexicography," Computational linguistics, Vol.16, No.1(1990), 22-29.
  6. Coussement, K. and D. Van den Poel, "Improving customer attrition prediction by integrating emotions from client/company interaction emails and evaluating multiple classifiers," Expert Systems with Applications, Vol.36, No.3(2009), 6127-6134. https://doi.org/10.1016/j.eswa.2008.07.021
  7. Ding, X., Liu, B., and P. S. Yu, A holistic lexicon-based approach to opinion mining. Proceedings of the 2008 International Conference on Web Search and Data Mining, ACM, 2008, 231-240.
  8. Du, W., Tan, S., Cheng, X., and X. Yun, "Adapting information bottleneck method for automatic construction of domain-oriented sentiment lexicon," Proceedings of the third ACM international conference on Web search and data mining, ACM, 2010, 111-120.
  9. Esuli, A. and F. Sebastiani, "Determining the semantic orientation of terms through gloss classification," Proceedings of the 14th ACM international conference on Information and knowledge management, 2005, 617-624.
  10. Esuli, A. and F. Sebastiani, "Sentiwordnet: A publicly available lexical resource for opinion mining," Proceedings of LREC, Vol.6(2006), 417-422.
  11. Feldman, R. and I. Dagan, "Knowledge Discovery in Textual Databases (KDT)," KDD, Vol.95 (1995), 112-117.
  12. Feldman, R. and J. Sanger, The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data, Cambridge University Press, 2007.
  13. Fletcher, D. and E. Goss, "Forecasting with neural networks: an application using bankruptcy data," Information & Management, Vol.24, No.3(1993), 159-167. https://doi.org/10.1016/0378-7206(93)90064-Z
  14. Grunert, J., Norden, L., and M. Weber, "The role of non-financial factors in internal credit ratings," Journal of Banking & Finance, Vol.29, No.2(2005), 509-531. https://doi.org/10.1016/j.jbankfin.2004.05.017
  15. Hamer, M. M., "Failure prediction: sensitivity of classification accuracy to alternative statistical methods and variable sets," Journal of Accounting and Public Policy, Vol, 2, No.4 (1984), 289-307. https://doi.org/10.1016/0278-4254(83)90032-7
  16. Jeong, J. S., D. S. Kim, and J. W. Kim, "Influence analysis of Internet buzz to corporate performance : Individual stock price prediction using sentiment analysis of online news," Journal of Intelligence and Information Systems, Vol.21, No.4(2015), 37-51. https://doi.org/10.13088/JIIS.2015.21.4.037
  17. Kim, S. M. and E. Hovy, "Determining the sentiment of opinions," Proceedings of the 20th international conference on Computational Linguistics, Association for Computational Linguistics, (2004), 1367-1373.
  18. Kim, S. M. and E. Hovy, "Extracting opinions, opinion holders, and topics expressed in online news media text," Proceedings of the Workshop on Sentiment and Subjectivity in Text, Association for Computational Linguistics, 2006, 1-8.
  19. Kim, S. and N. Kim, "A Study on the Effect of Using Sentiment Lexicon in Opinion Classification," Journal of Intelligence and Information Systems, Vol.20, No.1(2014), 133-148. https://doi.org/10.13088/JIIS.2014.20.1.133
  20. Lee, J. S. and J. H. Han, "Usability Test of Non-Financial Information in Bankruptcy Prediction using Artificial Neural Network-The Case of Small and Medium-Sized Firms," Journal of Intelligence and Information Systems, Vol.1, No.1(1995), 123-134.
  21. Leshno, M. and Y. Spector, "Neural network prediction analysis: The bankruptcy case," Neurocomputing, Vol.10, No.2(1996), 125-147. https://doi.org/10.1016/0925-2312(94)00060-3
  22. Matsumoto, S., Takamura, H., and M. Okumura, "Sentiment classification using word sub-sequences and dependency sub-trees," Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining, Springer-Verlag, 2005, 301-311.
  23. Melville, P., Gryc, W., and R. D. Lawrence, "Sentiment analysis of blogs by combining lexical knowledge with text classification," Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 2009, 1275-1284.
  24. O'Connor, B., Balasubramanyan, R., Routledge, B. R., and N. A. Smith, "From tweets to polls: Linking text sentiment to public opinion time series," Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media, Vol.11(2010), 122-129.
  25. Odom, M. D. and R. Sharda, "A neural network model for bankruptcy prediction," Proceedings of IJCNN International Joint Conference on Neural Networks, IEEE, 1990, 163-168.
  26. Ohlson, J. A., "Financial ratios and the probabilistic prediction of bankruptcy," Journal of accounting research, Vol.18, No.1(1980), 109-131. https://doi.org/10.2307/2490395
  27. Pang, B. and L. Lee, "Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales," Proceedings of the Association for Computational Linguistics (ACL), 2005, 115-124.
  28. Pang, B., Lee, L., and S. Vaithyanathan, "Thumbs up?: sentiment classification using machine learning techniques," Proceedings of the ACL-02 conference on Empirical methods in natural language processing, Vol.10(2002), 79-86.
  29. Pervan, I. and T. Kuvek, "The Relative Importance of Financial Ratios and Nonfinancial Variables in Predicting of Insolvency," Croatian Operational Research Review, Vol.4, No.1(2013), 187-197.
  30. Salah, Z., Coenen, F., and D. Grossi, "Generating Domain-Specific Sentiment Lexicons for Opinion Mining," Advanced Data Mining and Applications, Springer Berlin Heidelberg, 2013, 13-24.
  31. Salton, G. and C. Buckley, "Term-weighting approaches in automatic text retrieval," Information Processing and Management, Vol.23, No.5(1988), 513-523.
  32. Schumaker, R. P., Zhang, Y., Huang, C. N., and H. Chen, "Evaluating sentiment in financial news articles," Decision Support Systems, Vol.53, No.3(2012), 458-464. https://doi.org/10.1016/j.dss.2012.03.001
  33. Shaw, M. J. and J. A. Gentry, "Inductive learning for risk classification," IEEE Expert, Vol.5, No.1(1990), 47-53. https://doi.org/10.1109/64.50856
  34. Shin, K.-s., Lee, T. S., and H,-j. Kim, "An application of support vector machines in bankruptcy prediction model," Expert Systems with Applications, Vol.28, No.1(2005), 127-135. https://doi.org/10.1016/j.eswa.2004.08.009
  35. Sidorov, G. et al., "Empirical study of machine learning based approach for opinion mining in tweets," Proceedings of the 11th Mexican international conference on Advances in Artificial Intelligence, Vol. Part I, 2012, 1-14.
  36. Song, J. and S. Lee, "Automatic Construction of Positive/Negative Feature-Predicate Dictionary for Polarity Classification of Product Reviews," Journal of KIISE: Software and Applications, Vol.38, No.3(2011), 157-168.
  37. Sparck Jones, K., "A statistical interpretation of term specificity and its application in retrieval," Journal of documentation, Vol.28, No.1(1972), 11-21. https://doi.org/10.1108/eb026526
  38. Tam, K. Y. and M. Y. Kiang, "Managerial applications of neural networks: the case of bank failure predictions," Management science, Vol.38, No.7(1992), 926-947. https://doi.org/10.1287/mnsc.38.7.926
  39. Tetlock, P. C., "Saar-Tsechansky, M., and S. Macskassy, "More than words: Quantifying language to measure firms' fundamentals," The journal of finance, Vol.63, No.3(2008), 1437-1467. https://doi.org/10.1111/j.1540-6261.2008.01362.x
  40. Turney, P. D., "Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews," Proceedings of the 40th annual meeting on association for computational linguistics, 2002, 417-424.
  41. Turney, P. D. and M. L. Littman, "Measuring praise and criticism: Inference of semantic orientation from association," ACM Transactions on Information Systems (TOIS), Vol.21, No.4(2003), 315-346. https://doi.org/10.1145/944012.944013
  42. Wiebe, J., Wilson, T., Bruce, R., Bell, M., and M. Martin, "Learning subjective language," Computational linguistics, Vol.30, No.3(2004), 277-308. https://doi.org/10.1162/0891201041850885
  43. Wilson, T., Janyce W., and R. Hwa, "Just how mad are you? Finding strong and weak opinion clauses," Proceedings of National Conference on Artificial Intelligence (AAAI-2004), 2004, 761-767.
  44. Ye, Q., Zhang, Z., and R. Law, "Sentiment classification of online reviews to travel destinations by supervised machine learning approaches," Expert Systems with Applications, Vol.36, No.3(2009), 6527-6535. https://doi.org/10.1016/j.eswa.2008.07.035
  45. Yu, E., Kim, Y., Kim, N., and S. R. Jung, "Predicting the Direction of the Stock Index by Using a Domain-Specific Sentiment Dictionary," Journal of Intelligence and Information Systems, Vol.19, No.10(2013), 95-110. https://doi.org/10.13088/jiis.2013.19.1.095
  46. Yu, H. and V. Hatzivassiloglou, "Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences," Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, 2003, 129-136.
  47. Zhang, L. and B. Liu, "Identifying noun product features that imply opinions," Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Vol.2, (2011), 575-580.

Cited by

  1. 뉴스기사를 이용한 소비자의 경기심리지수 생성 vol.23, pp.3, 2016, https://doi.org/10.13088/jiis.2017.23.3.001
  2. 딥러닝 시계열 알고리즘 적용한 기업부도예측모형 유용성 검증 vol.24, pp.4, 2016, https://doi.org/10.13088/jiis.2018.24.4.001
  3. 재무제표 주석의 텍스트 분석 통한 재무 비율 예측 향상 연구 vol.21, pp.2, 2020, https://doi.org/10.15813/kmr.2020.21.2.010
  4. Incorporating textual and management factors into financial distress prediction: A comparative study of machine learning methods vol.39, pp.5, 2020, https://doi.org/10.1002/for.2661