Improving an Ensemble Model by Optimizing Bootstrap Sampling

Min, Sung-Hwan;

doi:10.7472/jksii.2016.17.2.49

Journal of Internet Computing and Services (인터넷정보학회논문지)

Volume 17 Issue 2
/
Pages.49-57
/
2016
/
1598-0170(pISSN)
/
2287-1136(eISSN)

Korean Society for Internet Information (한국인터넷정보학회)

DOI QR Code

Improving an Ensemble Model by Optimizing Bootstrap Sampling

부트스트랩 샘플링 최적화를 통한 앙상블 모형의 성능 개선

Min, Sung-Hwan (Department of Business Administration, Hallym University)

민성환

Received : 2016.02.16
Accepted : 2016.03.23
Published : 2016.04.30

https://doi.org/10.7472/jksii.2016.17.2.49 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble learning techniques are known to be very useful for improving prediction accuracy. Bagging is one of the most popular ensemble learning techniques. Bagging has been known to be successful in increasing the accuracy of prediction of the individual classifiers. Bagging draws bootstrap samples from the training sample, applies the classifier to each bootstrap sample, and then combines the predictions of these classifiers to get the final classification result. Bootstrap samples are simple random samples selected from the original training data, so not all bootstrap samples are equally informative, due to the randomness. In this study, we proposed a new method for improving the performance of the standard bagging ensemble by optimizing bootstrap samples. A genetic algorithm is used to optimize bootstrap samples of the ensemble for improving prediction accuracy of the ensemble model. The proposed model is applied to a bankruptcy prediction problem using a real dataset from Korean companies. The experimental results showed the effectiveness of the proposed model.

앙상블 학습 기법은 개별 모형보다 더 좋은 예측 성과를 얻기 위해 다수의 분류기를 결합하는 것으로 예측 성과를 향상시키는데에 매우 유용한 것으로 알려져 있다. 배깅은 단일 분류기의 예측 성과를 향상시키는 대표적인 앙상블 기법중의 하나이다. 배깅은 원 학습 데이터로부터 부트스트랩 샘플링 방법을 통해 서로 다른 학습 데이터를 추출하고, 각각의 부트스트랩 샘플에 대해 학습 알고리즘을 적용하여 서로 다른 다수의 기저 분류기들을 생성시키게 되며, 최종적으로 서로 다른 분류기로부터 나온 결과를 결합하게 된다. 배깅에서 부트스트랩 샘플은 원 학습 데이터로부터 램덤하게 추출한 샘플로 각각의 부트스트랩 샘플이 동일한 정보를 가지고 있지는 않으며 이로 인해 배깅 모형의 성과는 편차가 발생하게 된다. 본 논문에서는 이와 같은 부트스트랩 샘플을 최적화함으로써 표준 배깅 앙상블의 성과를 개선시키는 새로운 방법을 제안하였다. 제안한 모형에서는 앙상블 모형의 성과를 개선시키기 위해 부트스트랩 샘플링을 최적화하였으며 이를 위해 유전자 알고리즘이 활용되었다. 본 논문에서는 제안한 모형을 국내 부도 예측 문제에 적용해 보았으며, 실험 결과 제안한 모형이 우수한 성과를 보였다.

Keywords

References

T. G. Dietterich, "Machine-learning research: Four current directions," AI Magazine, Vol.18, No.4, 1997, pp. 97-136. http://dx.doi.org/10.1609/aimag.v18i4.1324
L. I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms, John Wiley & Sons, Inc., Hoboken, New Jersey, 2004. http://www.amazon.com/Combining-Pattern-Classifiers- Methods-Algorithms/dp/0471210781
S. Bian, W. Wang, "On diversity and accuracy of homogeneous and heterogeneous ensembles," International Journal of Hybrid Intelligent Systems, Vol.4, No.2, 2007, pp.103-128. http://content.iospress.com/articles/international-journal-o f-hybrid-intelligent-systems/his00044 https://doi.org/10.3233/HIS-2007-4204
L. I. Kuncheva, C. J. Whitaker, "Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy," Machine Learning, Vol.51, No.2, 2003, pp. 181-207. http://dx.doi.org/10.1023/A:1022859003006
L. Breiman, "Bagging predictors," Machine Learning, Vol. 24, No.2, 1996, pp. 123-140. http://dx.doi.org/10.1007/BF00058655
Y. Freund, R. Schapire, "Experiments with a new boosting algorithm," Proceedings of the 13th International Conference on Machine learning, 1996, pp. 148-156. http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.1 33.1040
T. Ho, "The random subspace method for construction decision forests," IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, pp.832-844. http://dx.doi.org/10.1109/34.709601
L. Hansen, Salamon, P, "Neural network ensembles," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.12, No.10, 1990, pp. 993-1001. http://dx.doi.org/10.1109/34.58871
S.H. Kim, J.W., Kim, "SOHO Bankruptcy Prediction Using Modified Bagging Predictors," Journal of Intelligence and Information Systems, Vol.13, No.2, 2007, pp. 15-26. http://koreascience.or.kr/article/ArticleFullRecord.jsp?cn =JJSHBB_2007_v13n2_15
C. Tsai, J. Wu. "Using Neural Network Ensembles for Bankruptcy Prediction and Credit Scoring," Expert Systems with Applications, Vol.34, No.4, 2008, pp. 2639-2649. http://dx.doi.org/10.1016/j.eswa.2007.05.019
M. Kim, "A Performance Comparison of Ensemble in Bankruptcy Prediction," Entrue Journal of Information Technology, Vol.8, No.2, 2009, pp. 41-49. http://scholar.ndsl.kr/schDetail.do?cn=NART50339718
H. Li, Y.-C. Lee, Y.-C. Zhou, J. Sun, "The random subspace binary logit (RSBL) model for bankruptcy prediction," Knowledge-Based Systems, Vol. 24, No.8, 2011, pp. 1380-1388 http://dx.doi.org/10.1016/j.knosys.2011.06.015
S. Min, "Developing an Ensemble Classifier for Bankruptcy Prediction," Journal of the Korea Industrial Information Systems Research, Vol. 17, No. 7, 2012, pp. 139-148. http://dx.doi.org/10.9723/jksiis.2012.17.7.139
A.I. Marques, V. Garcia, and J. S. Sanchez. "Two-Level Classifier Ensembles for Credit Risk Assessment," Expert Systems with Applications, Vol.39, No.12, 2012, pp. 10916-10922. http://dx.doi.org/10.1016/j.eswa.2012.03.033
H.N. Choi, D.H. Lim, "Bankruptcy prediction using ensemble SVM model," Journal of the Korean Data and Information Science Society, Vol.24, No.6, 2013, 1113-1125. http://dx.doi.org/10.7465/jkdi.2013.24.6.1113
S. Min, "Bankruptcy Prediction Using an Improved Bagging Ensemble," Journal of Intelligence and Information Systems, Vol.20, No.4, 2014, pp. 121-139. http://dx.doi.org/10.13088/jiis.2014.20.4.121
M. Kim, D. Kang, H.B. Kim, "Geometric Mean Based Boosting Algorithm with over-Sampling to Resolve Data Imbalance Problem for Bankruptcy Prediction," Expert Systems with Applications, Vol.42, No.3, 2015, pp. 1074-1082. http://dx.doi.org/10.1016/j.eswa.2014.08.025
C. Hung, J-H. Chen, "A Selective Ensemble Based on Expected Probabilities for Bankruptcy Prediction," Expert Systems with Applications, Vol.36, No.3, 2009, pp. 5297-5303. http://dx.doi.org/10.1016/j.eswa.2008.06.068
K. Li, Z. Liu, Y. Han, "Study of Selective Ensemble Learning Methods Based on Support Vector Machine," Physics Procedia, Vol. 33, 2012, pp.1518-1525. http://dx.doi.org/10.1016/j.phpro.2012.05.247
Y. Guo, et al., "A Novel Dynamic Rough Subspace Based Selective Ensemble," Pattern Recognition, Vol.48, No.5, 2014, pp. 1638-1652. http://dx.doi.org/10.1016/j.patcog.2014.11.001
T. K. Ho, "Multiple classifier combination: Lessons and the next steps," In A. Kandel and H. Bunke, editors, Hybrid Methods in Pattern Recognition. World Scientific Publishing, 2002 http://dx.doi.org/10.1142/9789812778147_0007
D. E. Goldberg, "Genetic algorithms in search, optimization and machine learning," New York: Addison-Wesley, 1989. http://catalogue.pearsoned.co.uk/educator/product/Geneti c-Algorithms-in-Search-Optimization-and-Machine-Lear ning/9780201157673.page

Cited by

머신러닝을 활용한 자동차 시트용 폴리우레탄 발포공정의 불량 예측 모델 개발 vol.22, pp.6, 2016, https://doi.org/10.5762/kais.2021.22.6.36

Journal of Internet Computing and Services (인터넷정보학회논문지)

Improving an Ensemble Model by Optimizing Bootstrap Sampling

부트스트랩 샘플링 최적화를 통한 앙상블 모형의 성능 개선

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)