DOI QR코드

DOI QR Code

Improving an Ensemble Model by Optimizing Bootstrap Sampling

부트스트랩 샘플링 최적화를 통한 앙상블 모형의 성능 개선

  • Min, Sung-Hwan (Department of Business Administration, Hallym University)
  • Received : 2016.02.16
  • Accepted : 2016.03.23
  • Published : 2016.04.30

Abstract

Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble learning techniques are known to be very useful for improving prediction accuracy. Bagging is one of the most popular ensemble learning techniques. Bagging has been known to be successful in increasing the accuracy of prediction of the individual classifiers. Bagging draws bootstrap samples from the training sample, applies the classifier to each bootstrap sample, and then combines the predictions of these classifiers to get the final classification result. Bootstrap samples are simple random samples selected from the original training data, so not all bootstrap samples are equally informative, due to the randomness. In this study, we proposed a new method for improving the performance of the standard bagging ensemble by optimizing bootstrap samples. A genetic algorithm is used to optimize bootstrap samples of the ensemble for improving prediction accuracy of the ensemble model. The proposed model is applied to a bankruptcy prediction problem using a real dataset from Korean companies. The experimental results showed the effectiveness of the proposed model.

앙상블 학습 기법은 개별 모형보다 더 좋은 예측 성과를 얻기 위해 다수의 분류기를 결합하는 것으로 예측 성과를 향상시키는데에 매우 유용한 것으로 알려져 있다. 배깅은 단일 분류기의 예측 성과를 향상시키는 대표적인 앙상블 기법중의 하나이다. 배깅은 원 학습 데이터로부터 부트스트랩 샘플링 방법을 통해 서로 다른 학습 데이터를 추출하고, 각각의 부트스트랩 샘플에 대해 학습 알고리즘을 적용하여 서로 다른 다수의 기저 분류기들을 생성시키게 되며, 최종적으로 서로 다른 분류기로부터 나온 결과를 결합하게 된다. 배깅에서 부트스트랩 샘플은 원 학습 데이터로부터 램덤하게 추출한 샘플로 각각의 부트스트랩 샘플이 동일한 정보를 가지고 있지는 않으며 이로 인해 배깅 모형의 성과는 편차가 발생하게 된다. 본 논문에서는 이와 같은 부트스트랩 샘플을 최적화함으로써 표준 배깅 앙상블의 성과를 개선시키는 새로운 방법을 제안하였다. 제안한 모형에서는 앙상블 모형의 성과를 개선시키기 위해 부트스트랩 샘플링을 최적화하였으며 이를 위해 유전자 알고리즘이 활용되었다. 본 논문에서는 제안한 모형을 국내 부도 예측 문제에 적용해 보았으며, 실험 결과 제안한 모형이 우수한 성과를 보였다.

Keywords

References

  1. T. G. Dietterich, "Machine-learning research: Four current directions," AI Magazine, Vol.18, No.4, 1997, pp. 97-136. http://dx.doi.org/10.1609/aimag.v18i4.1324
  2. L. I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms, John Wiley & Sons, Inc., Hoboken, New Jersey, 2004. http://www.amazon.com/Combining-Pattern-Classifiers- Methods-Algorithms/dp/0471210781
  3. S. Bian, W. Wang, "On diversity and accuracy of homogeneous and heterogeneous ensembles," International Journal of Hybrid Intelligent Systems, Vol.4, No.2, 2007, pp.103-128. http://content.iospress.com/articles/international-journal-o f-hybrid-intelligent-systems/his00044 https://doi.org/10.3233/HIS-2007-4204
  4. L. I. Kuncheva, C. J. Whitaker, "Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy," Machine Learning, Vol.51, No.2, 2003, pp. 181-207. http://dx.doi.org/10.1023/A:1022859003006
  5. L. Breiman, "Bagging predictors," Machine Learning, Vol. 24, No.2, 1996, pp. 123-140. http://dx.doi.org/10.1007/BF00058655
  6. Y. Freund, R. Schapire, "Experiments with a new boosting algorithm," Proceedings of the 13th International Conference on Machine learning, 1996, pp. 148-156. http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.1 33.1040
  7. T. Ho, "The random subspace method for construction decision forests," IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, pp.832-844. http://dx.doi.org/10.1109/34.709601
  8. L. Hansen, Salamon, P, "Neural network ensembles," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.12, No.10, 1990, pp. 993-1001. http://dx.doi.org/10.1109/34.58871
  9. S.H. Kim, J.W., Kim, "SOHO Bankruptcy Prediction Using Modified Bagging Predictors," Journal of Intelligence and Information Systems, Vol.13, No.2, 2007, pp. 15-26. http://koreascience.or.kr/article/ArticleFullRecord.jsp?cn =JJSHBB_2007_v13n2_15
  10. C. Tsai, J. Wu. "Using Neural Network Ensembles for Bankruptcy Prediction and Credit Scoring," Expert Systems with Applications, Vol.34, No.4, 2008, pp. 2639-2649. http://dx.doi.org/10.1016/j.eswa.2007.05.019
  11. M. Kim, "A Performance Comparison of Ensemble in Bankruptcy Prediction," Entrue Journal of Information Technology, Vol.8, No.2, 2009, pp. 41-49. http://scholar.ndsl.kr/schDetail.do?cn=NART50339718
  12. H. Li, Y.-C. Lee, Y.-C. Zhou, J. Sun, "The random subspace binary logit (RSBL) model for bankruptcy prediction," Knowledge-Based Systems, Vol. 24, No.8, 2011, pp. 1380-1388 http://dx.doi.org/10.1016/j.knosys.2011.06.015
  13. S. Min, "Developing an Ensemble Classifier for Bankruptcy Prediction," Journal of the Korea Industrial Information Systems Research, Vol. 17, No. 7, 2012, pp. 139-148. http://dx.doi.org/10.9723/jksiis.2012.17.7.139
  14. A.I. Marques, V. Garcia, and J. S. Sanchez. "Two-Level Classifier Ensembles for Credit Risk Assessment," Expert Systems with Applications, Vol.39, No.12, 2012, pp. 10916-10922. http://dx.doi.org/10.1016/j.eswa.2012.03.033
  15. H.N. Choi, D.H. Lim, "Bankruptcy prediction using ensemble SVM model," Journal of the Korean Data and Information Science Society, Vol.24, No.6, 2013, 1113-1125. http://dx.doi.org/10.7465/jkdi.2013.24.6.1113
  16. S. Min, "Bankruptcy Prediction Using an Improved Bagging Ensemble," Journal of Intelligence and Information Systems, Vol.20, No.4, 2014, pp. 121-139. http://dx.doi.org/10.13088/jiis.2014.20.4.121
  17. M. Kim, D. Kang, H.B. Kim, "Geometric Mean Based Boosting Algorithm with over-Sampling to Resolve Data Imbalance Problem for Bankruptcy Prediction," Expert Systems with Applications, Vol.42, No.3, 2015, pp. 1074-1082. http://dx.doi.org/10.1016/j.eswa.2014.08.025
  18. C. Hung, J-H. Chen, "A Selective Ensemble Based on Expected Probabilities for Bankruptcy Prediction," Expert Systems with Applications, Vol.36, No.3, 2009, pp. 5297-5303. http://dx.doi.org/10.1016/j.eswa.2008.06.068
  19. K. Li, Z. Liu, Y. Han, "Study of Selective Ensemble Learning Methods Based on Support Vector Machine," Physics Procedia, Vol. 33, 2012, pp.1518-1525. http://dx.doi.org/10.1016/j.phpro.2012.05.247
  20. Y. Guo, et al., "A Novel Dynamic Rough Subspace Based Selective Ensemble," Pattern Recognition, Vol.48, No.5, 2014, pp. 1638-1652. http://dx.doi.org/10.1016/j.patcog.2014.11.001
  21. T. K. Ho, "Multiple classifier combination: Lessons and the next steps," In A. Kandel and H. Bunke, editors, Hybrid Methods in Pattern Recognition. World Scientific Publishing, 2002 http://dx.doi.org/10.1142/9789812778147_0007
  22. D. E. Goldberg, "Genetic algorithms in search, optimization and machine learning," New York: Addison-Wesley, 1989. http://catalogue.pearsoned.co.uk/educator/product/Geneti c-Algorithms-in-Search-Optimization-and-Machine-Lear ning/9780201157673.page

Cited by

  1. 머신러닝을 활용한 자동차 시트용 폴리우레탄 발포공정의 불량 예측 모델 개발 vol.22, pp.6, 2016, https://doi.org/10.5762/kais.2021.22.6.36