Generation and Selection of Nominal Virtual Examples for Improving the Classifier Performance

Lee, Yu-Jung;Kang, Byoung-Ho;Kang, Jae-Ho;Ryu, Kwang-Ryel;

Journal of KIISE:Software and Applications (한국정보과학회논문지:소프트웨어및응용)

Volume 33 Issue 12
/
Pages.1052-1061
/
2006
/
1229-6848(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

Generation and Selection of Nominal Virtual Examples for Improving the Classifier Performance

분류기 성능 향상을 위한 범주 속성 가상예제의 생성과 선별

이유정 (부산대학교 컴퓨터공학과) ;
강병호 (부산대학교 컴퓨터공학과) ;
강재호 (야후코리아 Search R&D센터) ;
류광렬 (부산대학교 컴퓨터공학과)

Published : 2006.12.15

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

This paper presents a method of using virtual examples to improve the classification accuracy for data with nominal attributes. Most of the previous researches on virtual examples focused on data with numeric attributes, and they used domain-specific knowledge to generate useful virtual examples for a particularly targeted learning algorithm. Instead of using domain-specific knowledge, our method samples virtual examples from a naive Bayesian network constructed from the given training set. A sampled example is considered useful if it contributes to the increment of the network's conditional likelihood when added to the training set. A set of useful virtual examples can be collected by repeating this process of sampling followed by evaluation. Experiments have shown that the virtual examples collected this way.can help various learning algorithms to derive classifiers of improved accuracy.

본 논문에서는 베이지안 네트워크를 기반으로 생성하고 평가한 가상예제를 활용하여 범주 속성 데이타에 대한 분류 성능을 향상시키는 방안을 제안한다. 가상예제를 활용하는 종래의 연구들은 주로 수치 속성 데이타를 대상으로 하였고, 대상 도메인에 특화된 지식을 활용하여 특정 학습 알고리즘의 성능을 향상시키는 것을 목표로 하였다. 본 연구에서는 도메인에 특화된 지식을 활용하는 대신 주어진 훈련 집합을 기반으로 만든 베이지안 네트워크로부터 범주 속성 가상예제를 생성하고, 그 예제가 네트워크의 조건부 우도를 증가시키는데 기여할 경우 유용한 것으로 선별한다. 이러한 생성 및 선별과정을 반복하여 적절한 크기의 가상예제 집합을 수집하여 사용한다. 범주 속성 데이타를 대상으로 한 실험 결과, 여러 가지 학습 모델의 성능이 향상됨을 확인하였다.

Keywords

References

Quinlan, J. R., C4.5 : Programs for Machine Learning, Morgan Kaufmann Publishers, 1993
Aha, D. and Kibler, D., 'Instance-based Learning ？Algorithms,' Machine Learning, Vol.6, pp. 37-66, 1991 https://doi.org/10.1007/BF00153759
Breiman, L., 'Stacked Regression,' Machine Learning, Vol.24, No.2, pp. 123-140, 1996 https://doi.org/10.1023/A:1018054314350
Freund, Y. and Schapire, R. E., 'Experiments with a New Boosting Algorithm,' Proc. of the 13th International Conference on Machine Learning, pp, 148-156, 1996
Wolpert, D. H., 'Stacked Generalization,' Neural Networks, Vol.5, pp. 241-259, 1992 https://doi.org/10.1016/S0893-6080(05)80023-1
Aha, D. W., 'Tolerating Noisy, Irrelevant, and Novel Attributes in Instance-based Learning Algorithms,' International Journal of Man-Machine Studies, Vol.36, No.2, pp. 267-287, 1992 https://doi.org/10.1016/0020-7373(92)90018-G
Kohavi, R. and Sahami, M., 'Error-based and Entropy-based Discretization of Continuous Features,' Proc. of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 114-119, 1996
Pazzani, M., 'Constructive induction of Cartesian product attributes,' Information, Statistics and Induction in Science, pp. 66-77, 1996
Alrnuallim, H. and Dietterich, T. G., 'Learning With Many Irrelevant Features,' Proc. of the 9th National Conference on Artificial Intelligence, pp. 547-552, 1991
Greiner, R. and Zhou, W., 'Structural Extension to Logistic Regression: Discriminative parameter learning of belief net classifiers,' Proc. of the 18th National Conference on Artificial Intelligence, pp. 167 -173, 2002
Grossman, D. and Domingos, P., 'Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood,' Proc. of the 21th International Conference on Machine Learning, pp. 361-368, 2004
John, G. and Langley, P., 'Estimating Continuous Distributions in Bayesian Classifiers,' Proc. of the 11th Conference on Uncertainty in Artificial Intelligence, pp. 338-345, 1995
Scholkopf, B., Burges, C. J. C. and Smola, A. J., Advance in Kernel Methods - Support Vector Learning, MIT Press, 1998
Sietsma, J. and Dow, R. J. F., 'Creating Artificial Neural Networks that Generalize. Neural Networks,' IEEE transactions on Neural Networks, Vol.4, pp. 67-79, 1991 https://doi.org/10.1016/0893-6080(91)90033-2
Cho, S. and Cha, K., 'Evolution of Neural Network Training Set through Addition of Virtual samples,' Proc. of the 1996 IEEE International Conference on Evolutionary Computation, pp, 685-688, 1996 https://doi.org/10.1109/ICEC.1996.542684
Cho, S., Jang, M. and Chang, S., 'Virtual Sample Generation using a Population of Networks,' Neural Processing Letters, Vol.5, No.2, pp. 83-89, 1997 https://doi.org/10.1023/A:1009653706403
김종성, '분류 성능 향상을 위한 가상예제 생성 방안' ？부산대학교 석사학위논문， 2004
이유정， 강병호， 강재호， 류광렬， '가상예제를 이용한 naive Bayes 분류기 성능 향상' 한국정보과학회 제32회 추계학술발표회 논문집， Vol.:32， No.2， pp. 655-657, 2005
Burges, C. and Scholkopf, B., 'Improving the Accuracy and Speed of Support Vector Machines,' Advances in Neural Information Processing System, Vol.9, No.7, 1997
Ryu, Y. S. and Oh, S. Y., 'SIMPLE Hybrid Classifier for Face Recognition with Adaptively Generated Virtual Data,' Pattern Recognition Letters, 2002 https://doi.org/10.1016/S0167-8655(01)00159-3
김종성， 박태진， 강재호， 백납철， 강원회， 이상협， 류광렬， '병합된 예제를 이용한 자동 차 번호판 문자 인식' 한국정보과학회 2004 가을 학술발표논문집(I), 제31권， 제2호， pp. 238-240， 2004
이경순， 안동언 '문서분류에서 가상문서기법을 이용한 성능 향상' 정보처리학회논문지， 저11-B권， 제4호， pp. 501-508, 2004 https://doi.org/10.3745/KIPSTB.2004.11B.4.501
Newman, D. J., Hettich, S., Blake, C. L. and Merz, C. J., UCI Repository of machine learning databases [http://www.ics.uci.edu/-mlearn/MLRepository .html], CA: University of California, Department of Information and Computer Science, Irvine, 1998
Weka3 - Data Mining with Open Source Machine Learning Software in Java http://www.cs.waikato.ac.nz/-ml/weka
Witten, I. H. and Frank, E., Data Mining-Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufman Publishers, 1999

Journal of KIISE:Software and Applications (한국정보과학회논문지:소프트웨어및응용)

Generation and Selection of Nominal Virtual Examples for Improving the Classifier Performance

분류기 성능 향상을 위한 범주 속성 가상예제의 생성과 선별

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)