Exploring the Performance of Synthetic Minority Over-sampling Technique (SMOTE) to Predict Good Borrowers in P2P Lending

Costello, Francis Joseph;Lee, Kun Chang;

doi:10.14400/JDC.2019.17.9.071

Journal of Digital Convergence (디지털융복합연구)

Volume 17 Issue 9
/
Pages.71-78
/
2019
/
2713-6434(pISSN)
/
2713-6442(eISSN)

The Society of Digital Policy and Management (한국디지털정책학회)

DOI QR Code

Exploring the Performance of Synthetic Minority Over-sampling Technique (SMOTE) to Predict Good Borrowers in P2P Lending

P2P 대부 우수 대출자 예측을 위한 합성 소수집단 오버샘플링 기법 성과에 관한 탐색적 연구

Costello, Francis Joseph (SKK Business School, Sungkyunkwan University) ;
Lee, Kun Chang (Global Business Administration/Dept of Health Sciences & Technology, SAIHST Sungkyunkwan University)

프란시스 조셉 코스텔로 (성균관대학교 경영대학) ;
이건창 (성균관대학교 글로벌경영학과/삼성융합의과학원 융합의과학과)

Received : 2019.06.19
Accepted : 2019.09.20
Published : 2019.09.28

https://doi.org/10.14400/JDC.2019.17.9.071 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

This study aims to identify good borrowers within the context of P2P lending. P2P lending is a growing platform that allows individuals to lend and borrow money from each other. Inherent in any loans is credit risk of borrowers and needs to be considered before any lending. Specifically in the context of P2P lending, traditional models fall short and thus this study aimed to rectify this as well as explore the problem of class imbalances seen within credit risk data sets. This study implemented an over-sampling technique known as Synthetic Minority Over-sampling Technique (SMOTE). To test our approach, we implemented five benchmarking classifiers such as support vector machines, logistic regression, k-nearest neighbor, random forest, and deep neural network. The data sample used was retrieved from the publicly available LendingClub dataset. The proposed SMOTE revealed significantly improved results in comparison with the benchmarking classifiers. These results should help actors engaged within P2P lending to make better informed decisions when selecting potential borrowers eliminating the higher risks present in P2P lending.

본 연구는 P2P 대부 플랫폼에서 우수 대출자를 예측시 유용한 합성 소수집단 오버샘플링 기법을 제안하고 그 성과를 실증적으로 검증하고자 한다. P2P 대부 관련 우수 대출자를 추정할 때 일어나는 문제점중의 하나는 클래스 간 불균형이 심하여 이를 해결하지 않고서는 우수 대출자 예측이 쉽지 않다는 점이다. 이러한 문제를 해결하기 위하여 본 연구에서는 SMOTE, 즉 합성 소수집단 오버샘플링 기법을 제안하고 LendingClub 데이터셋에 적용하여 성과를 검증하였다. 검증결과 SMOTE 방법은 서포트 벡터머신, k-최근접이웃, 로지스틱 회귀, 랜덤 포레스트, 그리고 딥 뉴럴네트워크 분류기와 비교하여 통계적으로 우수한 성과를 보였다.

Keywords

References

H. Zhao, Y. Ge, Q. Liu, G. Wang, E. Chen, & Zhang. (2017). P2P lending survey: platforms, recent advances and prospects. ACM Transactions on Intelligent Systems and Technology (TIST), 8(6), 72. DOI : 10.1145/3078848
C. Serrano-Cinca & B. Gutiurrez-Nieto. (2016). The use of profit scoring as an alternative to credit scoring systems in peer-to-peer (P2P) lending. Decision Support Systems, 89, 113-122. DOI : 10.1016/j.dss.2016.06.014
M. Malekipirbazari & V. Aksakalli. (2015). Risk assessment in social lending via random forests. Expert Systems with Applications, 42(10), 4621-4631. DOI : 10.1016/j.eswa.2015.02.001
R. Emekter, Y. Tu, B. Jirasakuldech & M. Lu. (2015). Evaluating credit risk and loan performance in online Peer-to-Peer (P2P) lending. Applied Economics, 47(1), 54-70. DOI : 10.1080/00036846.2014.962222
Y. Guo, W. Zhou, C. Luo, C. Liu & H. Xiong. (2016). Instance-based credit risk assessment for investment decisions in P2P lending. European Journal of Operational Research, 249(2), 417-426. DOI : 10.1016/j.ejor.2015.05.050
A. Byanjankar, M. Heikkill & J. Mezei. (2015, December). Predicting credit risk in peer-to-peer lending: A neural network approach. In 2015 IEEE Symposium Series on Computational Intelligence (pp. 719-725). IEEE. DOI : doi.org/10.1109/SSCI.2015.109
J. Abellion & J. G. Castellano. (2017). A comparative study on base classifiers in ensemble methods for credit scoring. Expert Systems with Applications, 73, 1-10. DOI : 10.1016/j.eswa.2016.12.020
S. Lessmann, B. Baesens, H. V. Seow & L. C. Thomas. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1), 124-136. DOI : 10.1016/j.ejor.2015.05.030
I. Brown & C. Mues. (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications, 39(3), 3446-3453. DOI : 10.1016/j.eswa.2011.09.033
N. V. Chawla, K. W. Bowyer, L. O. Hall & W. P. Kegelmeyer. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357. DOI : 10.1613/jair.953
K. K. Lai, L. Yu, S. Wang & L. Zhou. (2006, September). Credit risk analysis using a reliability-based neural network ensemble model. In International Conference on Artificial Neural Networks (pp. 682-690). Springer, Berlin, Heidelberg. DOI : 10.1007/11840930_71
L. Yu, X. Yao, S. Wang & K. K. Lai. (2011). Credit risk evaluation using a weighted least squares SVM classifier with design of experiment for parameter selection. Expert Systems with Applications, 38(12), 15392-15399. DOI : 10.1016/j.eswa.2011.06.023
L. Yu, S. Wang & J. Cao. (2009). A modified least squares support vector machine classifier with application to credit risk analysis. International Journal of Information Technology & Decision Making, 8(04), 697-710. DOI : 10.1142/S0219622009003600
Y. Wang, S. Wang & K. K. Lai. (2005). A new fuzzy support vector machine to evaluate credit risk. IEEE Transactions on Fuzzy Systems, 13(6), 820-831. DOI : 10.1109/TFUZZ.2005.859320
W. E. Henley & D. J. Hand. (1996). A K-Nearest-Neighbour Classifier for Assessing Consumer Credit Risk. Journal of the Royal Statistical Society: Series D (The Statistician), 45(1), 77-95. DOI : 10.2307/2348414
G. Wang, J. Ma, L. Huang & K. Xu. (2012). Two credit scoring models based on dual strategy ensemble trees. Knowledge-Based Systems, 26, 61-68. DOI : 10.1016/j.knosys.2011.06.020
G. Wang, J. Hao, J. Ma & H. Jiang. (2011). A comparative assessment of ensemble learning for credit scoring. Expert systems with applications, 38(1), 223-230. DOI : 10.1016/j.eswa.2010.06.048
C. Luo, D. Wu & D. Wu. (2017). A deep learning approach for credit scoring using credit default swaps. Engineering Applications of Artificial Intelligence, 65, 465-470. DOI : 10.1016/j.engappai.2016.12.002
J. Sun, J. Lang, H. Fujita & H. Li. (2018). Imbalanced enterprise credit evaluation with DTE-SBD: decision tree ensemble based on SMOTE and bagging with differentiated sampling rates. Information Sciences, 425, 76-91. DOI : 10.1016/j.ins.2017.10.017
L. Zhang & W. Wang. (2011, September). A re-sampling method for class imbalance learning with credit data. In 2011 International Conference of Information Technology, Computer Engineering and Management Sciences, 1, 393-397. DOI : 10.1109/ICM.2011.34
B. E. Boser, I. M. Guyon & V. N. Vapnik. (1992, July). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory (pp. 144-152). ACM. DOI : 10.1145/130385.130401
H. Byun & S. W. Lee. (2003). A survey on pattern recognition applications of support vector machines. International Journal of Pattern Recognition and Artificial Intelligence, 17(3), 459-486. DOI : 10.1142/S0218001403002460
M. Huang, C. Chen, W. Lin, S. Ke & C. Tsai. (2017). SVM and SVM Ensembles in Breast Cancer Prediction. PLOS ONE, 12(1), 1-14. DOI : 10.1371/journal.pone.0161501
I. H. Witten, E. Frank, M. A. Hall & C. J. Pal. (2016). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.
D. W. Aha, D. Kibler & M. K. Albert. (1991). Instance-based learning algorithms. Machine learning, 6(1), 37-66. DOI : 10.1007/BF00153759
Y. LeCun, Y. Bengio & G. Hinton. (2015). Deep learning. Nature, 521(7553), 436. DOI : 10.1038/nature14539
I. Goodfellow, Y. Bengio & A. Courville. (2016). Deep learning. MIT press.

Journal of Digital Convergence (디지털융복합연구)

Exploring the Performance of Synthetic Minority Over-sampling Technique (SMOTE) to Predict Good Borrowers in P2P Lending

P2P 대부 우수 대출자 예측을 위한 합성 소수집단 오버샘플링 기법 성과에 관한 탐색적 연구

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)