DOI QR코드

DOI QR Code

Exploring the Performance of Synthetic Minority Over-sampling Technique (SMOTE) to Predict Good Borrowers in P2P Lending

P2P 대부 우수 대출자 예측을 위한 합성 소수집단 오버샘플링 기법 성과에 관한 탐색적 연구

  • Received : 2019.06.19
  • Accepted : 2019.09.20
  • Published : 2019.09.28

Abstract

This study aims to identify good borrowers within the context of P2P lending. P2P lending is a growing platform that allows individuals to lend and borrow money from each other. Inherent in any loans is credit risk of borrowers and needs to be considered before any lending. Specifically in the context of P2P lending, traditional models fall short and thus this study aimed to rectify this as well as explore the problem of class imbalances seen within credit risk data sets. This study implemented an over-sampling technique known as Synthetic Minority Over-sampling Technique (SMOTE). To test our approach, we implemented five benchmarking classifiers such as support vector machines, logistic regression, k-nearest neighbor, random forest, and deep neural network. The data sample used was retrieved from the publicly available LendingClub dataset. The proposed SMOTE revealed significantly improved results in comparison with the benchmarking classifiers. These results should help actors engaged within P2P lending to make better informed decisions when selecting potential borrowers eliminating the higher risks present in P2P lending.

본 연구는 P2P 대부 플랫폼에서 우수 대출자를 예측시 유용한 합성 소수집단 오버샘플링 기법을 제안하고 그 성과를 실증적으로 검증하고자 한다. P2P 대부 관련 우수 대출자를 추정할 때 일어나는 문제점중의 하나는 클래스 간 불균형이 심하여 이를 해결하지 않고서는 우수 대출자 예측이 쉽지 않다는 점이다. 이러한 문제를 해결하기 위하여 본 연구에서는 SMOTE, 즉 합성 소수집단 오버샘플링 기법을 제안하고 LendingClub 데이터셋에 적용하여 성과를 검증하였다. 검증결과 SMOTE 방법은 서포트 벡터머신, k-최근접이웃, 로지스틱 회귀, 랜덤 포레스트, 그리고 딥 뉴럴네트워크 분류기와 비교하여 통계적으로 우수한 성과를 보였다.

Keywords

References

  1. H. Zhao, Y. Ge, Q. Liu, G. Wang, E. Chen, & Zhang. (2017). P2P lending survey: platforms, recent advances and prospects. ACM Transactions on Intelligent Systems and Technology (TIST), 8(6), 72. DOI : 10.1145/3078848
  2. C. Serrano-Cinca & B. Gutiurrez-Nieto. (2016). The use of profit scoring as an alternative to credit scoring systems in peer-to-peer (P2P) lending. Decision Support Systems, 89, 113-122. DOI : 10.1016/j.dss.2016.06.014
  3. M. Malekipirbazari & V. Aksakalli. (2015). Risk assessment in social lending via random forests. Expert Systems with Applications, 42(10), 4621-4631. DOI : 10.1016/j.eswa.2015.02.001
  4. R. Emekter, Y. Tu, B. Jirasakuldech & M. Lu. (2015). Evaluating credit risk and loan performance in online Peer-to-Peer (P2P) lending. Applied Economics, 47(1), 54-70. DOI : 10.1080/00036846.2014.962222
  5. Y. Guo, W. Zhou, C. Luo, C. Liu & H. Xiong. (2016). Instance-based credit risk assessment for investment decisions in P2P lending. European Journal of Operational Research, 249(2), 417-426. DOI : 10.1016/j.ejor.2015.05.050
  6. A. Byanjankar, M. Heikkill & J. Mezei. (2015, December). Predicting credit risk in peer-to-peer lending: A neural network approach. In 2015 IEEE Symposium Series on Computational Intelligence (pp. 719-725). IEEE. DOI : doi.org/10.1109/SSCI.2015.109
  7. J. Abellion & J. G. Castellano. (2017). A comparative study on base classifiers in ensemble methods for credit scoring. Expert Systems with Applications, 73, 1-10. DOI : 10.1016/j.eswa.2016.12.020
  8. S. Lessmann, B. Baesens, H. V. Seow & L. C. Thomas. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1), 124-136. DOI : 10.1016/j.ejor.2015.05.030
  9. I. Brown & C. Mues. (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications, 39(3), 3446-3453. DOI : 10.1016/j.eswa.2011.09.033
  10. N. V. Chawla, K. W. Bowyer, L. O. Hall & W. P. Kegelmeyer. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357. DOI : 10.1613/jair.953
  11. K. K. Lai, L. Yu, S. Wang & L. Zhou. (2006, September). Credit risk analysis using a reliability-based neural network ensemble model. In International Conference on Artificial Neural Networks (pp. 682-690). Springer, Berlin, Heidelberg. DOI : 10.1007/11840930_71
  12. L. Yu, X. Yao, S. Wang & K. K. Lai. (2011). Credit risk evaluation using a weighted least squares SVM classifier with design of experiment for parameter selection. Expert Systems with Applications, 38(12), 15392-15399. DOI : 10.1016/j.eswa.2011.06.023
  13. L. Yu, S. Wang & J. Cao. (2009). A modified least squares support vector machine classifier with application to credit risk analysis. International Journal of Information Technology & Decision Making, 8(04), 697-710. DOI : 10.1142/S0219622009003600
  14. Y. Wang, S. Wang & K. K. Lai. (2005). A new fuzzy support vector machine to evaluate credit risk. IEEE Transactions on Fuzzy Systems, 13(6), 820-831. DOI : 10.1109/TFUZZ.2005.859320
  15. W. E. Henley & D. J. Hand. (1996). A K-Nearest-Neighbour Classifier for Assessing Consumer Credit Risk. Journal of the Royal Statistical Society: Series D (The Statistician), 45(1), 77-95. DOI : 10.2307/2348414
  16. G. Wang, J. Ma, L. Huang & K. Xu. (2012). Two credit scoring models based on dual strategy ensemble trees. Knowledge-Based Systems, 26, 61-68. DOI : 10.1016/j.knosys.2011.06.020
  17. G. Wang, J. Hao, J. Ma & H. Jiang. (2011). A comparative assessment of ensemble learning for credit scoring. Expert systems with applications, 38(1), 223-230. DOI : 10.1016/j.eswa.2010.06.048
  18. C. Luo, D. Wu & D. Wu. (2017). A deep learning approach for credit scoring using credit default swaps. Engineering Applications of Artificial Intelligence, 65, 465-470. DOI : 10.1016/j.engappai.2016.12.002
  19. J. Sun, J. Lang, H. Fujita & H. Li. (2018). Imbalanced enterprise credit evaluation with DTE-SBD: decision tree ensemble based on SMOTE and bagging with differentiated sampling rates. Information Sciences, 425, 76-91. DOI : 10.1016/j.ins.2017.10.017
  20. L. Zhang & W. Wang. (2011, September). A re-sampling method for class imbalance learning with credit data. In 2011 International Conference of Information Technology, Computer Engineering and Management Sciences, 1, 393-397. DOI : 10.1109/ICM.2011.34
  21. B. E. Boser, I. M. Guyon & V. N. Vapnik. (1992, July). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory (pp. 144-152). ACM. DOI : 10.1145/130385.130401
  22. H. Byun & S. W. Lee. (2003). A survey on pattern recognition applications of support vector machines. International Journal of Pattern Recognition and Artificial Intelligence, 17(3), 459-486. DOI : 10.1142/S0218001403002460
  23. M. Huang, C. Chen, W. Lin, S. Ke & C. Tsai. (2017). SVM and SVM Ensembles in Breast Cancer Prediction. PLOS ONE, 12(1), 1-14. DOI : 10.1371/journal.pone.0161501
  24. I. H. Witten, E. Frank, M. A. Hall & C. J. Pal. (2016). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.
  25. D. W. Aha, D. Kibler & M. K. Albert. (1991). Instance-based learning algorithms. Machine learning, 6(1), 37-66. DOI : 10.1007/BF00153759
  26. Y. LeCun, Y. Bengio & G. Hinton. (2015). Deep learning. Nature, 521(7553), 436. DOI : 10.1038/nature14539
  27. I. Goodfellow, Y. Bengio & A. Courville. (2016). Deep learning. MIT press.