DOI QR코드

DOI QR Code

ROC Function Estimation

ROC 함수 추정

  • Hong, Chong-Sun (Department of Statistics, Sungkyunkwan University) ;
  • Lin, Mei Hua (Research Institute of Applied Statistics, Sungkyunkwan University) ;
  • Hong, Sun-Woo (Research Institute of Applied Statistics, Sungkyunkwan University)
  • 홍종선 (성균관대학교 경제학부 통계학과) ;
  • ;
  • 홍선우 (성균관대학교 응용통계연구소)
  • Received : 20110700
  • Accepted : 20111000
  • Published : 2011.12.31

Abstract

From the point view of credit evaluation whose population is divided into the default and non-default state, two methods are considered to estimate conditional distribution functions: one is to estimate under the assumption that the data is followed the mixture normal distribution and the other is to use the kernel density estimation. The parameters of normal mixture are estimated using the EM algorithm. For the kernel density estimation, five kinds of well known kernel functions and four kinds of the bandwidths are explored. In addition, the corresponding ROC functions are obtained based on the estimated distribution functions. The goodness-of-fit of the estimated distribution functions are discussed and the performance of the ROC functions are compared. In this work, it is found that the kernel distribution functions shows better fit, and the ROC function obtained under the assumption of normal mixture shows better performance.

모집단이 부도와 정상상태로 구분되는 신용평가 관점에서 부도와 정상 상태의 조건부 누적분포함수를 추정하는 방법으로 정규혼합 분포추정과 kernel density estimation을 이용하는 분포추정을 고려한다. 정규혼합 분포의 모수를 EM 알고리즘을 사용해 추정하고, KDE 방법에서는 많이 사용하는 다섯 종류의 커널 함수와 네가지의 띠폭을 이용한다. 그리고 추정한 분포로부터 구한 각각의 ROC 함수를 구한다. 추정한 분포들의 적합도를 비교 분석하고, 이를 바탕으로 구한 ROC 곡선의 성과를 비교 토론한다. 본 연구에서는 KDE 방법으로 추정한 분포함수가 더 적합하고, 추정한 정규혼합 분포를 이용한 ROC 함수가 더 좋은 성과를 나타내는 것을 발견하였다.

Keywords

References

  1. 홍종선, 이원용 (2011). 정규혼합분포를 이용한 ROC 곡선연구, 응용통계연구, 24, 269-278. https://doi.org/10.5351/KJAS.2011.24.2.269
  2. 홍종선, 주재선, 최진수 (2010). 혼합분포에서의 최적분류점, 응용통계연구, 23, 13-28. https://doi.org/10.5351/KJAS.2010.23.1.013
  3. 홍종선, 최진수 (2009). ROC와 CAP 곡선에서의 최적분류점, <응용통계연구>, 22, 911-921. https://doi.org/10.5351/KJAS.2009.22.5.911
  4. Aitkin, M. and Wilson, T. G. (1980). Mixture models, outliers, and the EM algorithm, Technometrics, 22, 325-331. https://doi.org/10.2307/1268316
  5. Egan, J. P. (1975). Signal Detection Theory and ROC Analysis, Series in Cognitition and Perception, Academic Press, New York.
  6. Everitt, B. S. (1984). Maximum likelihood estimation of the parameters in a mixture of two univariate normal, Journal of the Royal Statistical Society, 33, 205-215.
  7. Fawcett, T. (2003). ROC graphs: Notes and practical considerations for data mining researchers, Technical Report HPL-2003-4, HP Laboratories, 1-28.
  8. Hall, P. G. and Hyndman, R. J. (2003). Improved methods for bandwidth selection when estimating ROC curves, Statistics and Probability Letters, 64, 181-189. https://doi.org/10.1016/S0167-7152(03)00150-0
  9. Joseph, M. P. (2005). A PD Validation Framework for Basel II Internal Ratings-Based Systems, Credit Scoring and Credit Control IV .
  10. Lloyd, C. J. (1998). The use of smoothed ROC curves to summarise and compare diagnostic systems, Journal of the American Statistical Association, 93, 1356-1364. https://doi.org/10.2307/2670051
  11. Lloyd, C. J. and Yong, Z. (1999). Kernel estimators of the ROC curve are better than empirical, Statistics and Probability Letters, 44, 221-228. https://doi.org/10.1016/S0167-7152(99)00012-7
  12. McCullagh, P. and Nelder, J. A. (1983). Quasi-likelihood functions, Annals of Statistics, 11, 59-67. https://doi.org/10.1214/aos/1176346056
  13. McLachlan, G. J. and Krishnan, T. (1997). The EM Algorithm and Extensions, John Wiley & Sons, New York.
  14. Pepe, M. S. (1998). Three approaches to regression analysis of receiver operating characteristic curves for continuous test results, Biometrics, 54, 124-135. https://doi.org/10.2307/2534001
  15. Pepe, M. S. (2003). The Statistical Evaluation of Medical Tests for Classiffication and Prediction, University Press, Oxford, New York.
  16. Provost, F. and Fawcett, T. (1997). Analysis and visualization of classifier performance comparison under imprecise class and cost distributions, In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, 43-48.
  17. Provost, F. and Fawcett, T. (2001). Robust classification for imprecise environments, Machine Learning, 42, 203-231. https://doi.org/10.1023/A:1007601015854
  18. Rossenblatt, M. (1956). Remarks on some nonparametric estimates of a density function, Annals of Mathematical Statistics, 27, 832-837. https://doi.org/10.1214/aoms/1177728190
  19. Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis, Chapman and Hall, London.
  20. Swets, J. A. (1988). Measuring the accuracy of diagnostic systems, American Association for the Advancement of Science, 240, 1285-1293. https://doi.org/10.1126/science.3287615
  21. Swets, J. A., Dawes, R. M. and Monahan, J. (2000). Better decisions through science, Scientific Americal, 283, 82-87.
  22. Tasche, D. (2006). Validation of internal rating systems and PD Estimates, On-line bibliography available from: http://arXiv:physics/0606071.
  23. Zou, K. H., Hall, W. J. and Shapiro, D. E. (1997). Smooth non-parametric receiver operating characteristic(ROC) curves for continuous diagnostic tests, Statistics in Medicine, 16, 2143-2156. https://doi.org/10.1002/(SICI)1097-0258(19971015)16:19<2143::AID-SIM655>3.0.CO;2-3

Cited by

  1. Alternative Optimal Threshold Criteria: MFR vol.27, pp.5, 2014, https://doi.org/10.5351/KJAS.2014.27.5.773