DOI QR코드

DOI QR Code

Under Sampling for Imbalanced Data using Minor Class based SVM (MCSVM) in Semiconductor Process

MCSVM을 이용한 반도체 공정데이터의 과소 추출 기법

  • Pak, Sae-Rom (School of Industrial Management Engineering, Korea University) ;
  • Kim, Jun Seok (School of Industrial Management Engineering, Korea University) ;
  • Park, Cheong-Sool (School of Industrial Management Engineering, Korea University) ;
  • Park, Seung Hwan (School of Industrial Management Engineering, Korea University) ;
  • Baek, Jun-Geol (School of Industrial Management Engineering, Korea University)
  • 박새롬 (고려대학교 산업경영공학과) ;
  • 김준석 (고려대학교 산업경영공학과) ;
  • 박정술 (고려대학교 산업경영공학과) ;
  • 박승환 (고려대학교 산업경영공학과) ;
  • 백준걸 (고려대학교 산업경영공학과)
  • Received : 2014.04.28
  • Accepted : 2014.05.29
  • Published : 2014.08.15

Abstract

Yield prediction is important to manage semiconductor quality. Many researches with machine learning algorithms such as SVM (support vector machine) are conducted to predict yield precisely. However, yield prediction using SVM is hard because extremely imbalanced and big data are generated by final test procedure in semiconductor manufacturing process. Using SVM algorithm with imbalanced data sometimes cause unnecessary support vectors from major class because of unselected support vectors from minor class. So, decision boundary at target class can be overwhelmed by effect of observations in major class. For this reason, we propose a under-sampling method with minor class based SVM (MCSVM) which overcomes the limitations of ordinary SVM algorithm. MCSVM constructs the model that fixes some of data from minor class as support vectors, and they can be good samples representing the nature of target class. Several experimental studies with using the data sets from UCI and real manufacturing process represent that our proposed method performs better than existing sampling methods.

Keywords

References

  1. An, D. W., Ko, H. H., Kim, J. H., Baek, J. G., and Kim, S. S. (2009), A Yields Prediction in the Semiconductor Manufacturing Process Using Stepwise Support Vector Machine, IE interfaces, 22(3), 252-262.
  2. Akbani, R., Kwek, S., and Japkowicz, N. (2004), Applying support vector machines to imbalanced datasets, In Machine Learning : ECML 2004(39-50). Springer Berlin Heidelberg.
  3. Bache, K. and Lichman, M. (2013), UCI Machine Learning Repository, http://archive.ics.uci.edu/ml, Irvine, CA : University of California, School of Information and Computer Science.
  4. Baek, D. H. and Han, C. H. (2003), Application of Data mining for improving and predicting yield in wafer fabrication system, Journal of Intelligence and Information Systems, 9(1), 157-177.
  5. Barandela, R., Sanchez, J. S., Garcia, V., and Rangel, E. (2003), Strategies for learning in class imbalance problems, Pattern Recognition, 36(3), 849-851. https://doi.org/10.1016/S0031-3203(02)00257-1
  6. Chang, C. C. and Lin, C. J. (2001b), Training n-support vector classifiers : theory and algorithms, Neural Computation, 13(9), 2119-2147. https://doi.org/10.1162/089976601750399335
  7. Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2011), SMOTE : synthetic minority over-sampling technique, arXiv preprint arXiv : 1106.1813.
  8. Chyi, Y.-M. (2003), Classification analysis techniques for skewed class distribution problems, Master thesis, Department of Information Management, National Sun Yat-Sen University.
  9. Ciciani, B. and Iazeolla, G. (1991), A Markov chain-based yield formula for VLSI fault-tolerant chips, Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 10(2), 252-259. https://doi.org/10.1109/43.68412
  10. Cortes, C. and Vapnik, V. (1995), Support-vector networks, Machine learning, 20(3), 273-297.
  11. Cristianini, N. and Shawe-Taylor, J. (2000), An introduction to support vector machines and other kernel-based learning methods, Cambridge University press.
  12. Crosier, R. B. (1988), Multivariate generalizations of cumulative sum quality-control schemes, Technometrics, 30(3), 291-303. https://doi.org/10.1080/00401706.1988.10488402
  13. Goldberg, D. (1991), What every computer scientist should know about floating-point arithmetic, ACM Computing Surveys (CSUR), 23(1), 5-48. https://doi.org/10.1145/103162.103163
  14. Han, H. Y. (2009), Introduction of Patter Recognition, HANBIT Media, Seoul Korea.
  15. Hsu, C. W., Chang, C. C., and Lin, C. J. (2003), A practical guide to support vector classification.
  16. Jang, D. Y. and Bae, S. J., (2009), Hybrid Datamining Algorithm for Monitoring Input Variables in Semiconductor Manufacturing Process, IE Interfaces, 563-569.
  17. Kang, P. and Cho, S. (2006), EUS SVMs : Ensemble of under-sampled SVMs for data imbalance problems, In Neural Information Processing (837-846), Springer Berlin Heidelberg.
  18. Kim, J. W., Park, J. S., Kim, J. S., Kim, S. S., and Baek, J. G. (2014), Update Cycle Detection Method of Control Limits using Control Chart Performance Evaluation Model, Journal of the Korean Institute of Industrial Engineering, 40(1), 43-51. https://doi.org/10.7232/JKIIE.2014.40.1.043
  19. Kim, K., Hwang, C. G., and Lee, J. G. (1998), DRAM technology perspective for gigabit era. Electron Devices, IEEE Transactions on, 45(3), 598-608. https://doi.org/10.1109/16.661221
  20. Kim, M. J. (2012), Ensemble Learning with Support Vector Machines for Bond Rating, Journal of Intelligence and Information Systems, 18(2), 29-45.
  21. Kim, M. S. and Baek, J. G. (2011), Fail Prediction of DRAM Module Outgoing Quality Assurance Inspection using Ensemble Learning Algorithm, IE Interfaces, 25(2), 178-186. https://doi.org/10.7232/IEIF.2012.25.2.178
  22. Kim, S. C. (2010), A Joint Design of Rectifying Inspection Plans and Service Capacities for Multi-Products, Journal of the Korea Operations Research and Management Science Society, 35(1), 97-109.
  23. Kim, S. E., Kang, J. H., Park, J. H., Kim, S. S., and Baek, J. G. (2012), Fault Detection of Unbalanced Cycle Signal Data Using SOMbased Feature Signal Extraction Method, Journal of The Korea Society for Simulation, 21(2), 79-90.
  24. Kymal, C. and Patiyasevi, P. (2006), Semiconductor quality initiatives : How to maintain quality in this fast-changing industry, Quality Digest, 26(4), 43-48.
  25. Li, T. S. and Huang, C. L. (2009), Defect spatial pattern recognition using a hybrid SOM-SVM approach in semiconductor manufacturing, Expert Systems with Applications, 36(1), 374-385. https://doi.org/10.1016/j.eswa.2007.09.023
  26. Scholkopf, B. and Smola, A. J. (2002), Learning with Kernels : Support Vector Machines, Regularization, Optimization and Beyond, MIT press.
  27. Shin, H. and Cho, S. (2006), Response modeling with support vector machines, Expert Systems with Applications, 30(4), 746-760. https://doi.org/10.1016/j.eswa.2005.07.037
  28. Yan, R., Liu, Y., Jin, R., and Hauptmann, A. (2003), On predicting rare classes with SVM ensembles in scene classification. In Acoustics, Speech, and Signal Processing, 2003, Proceedings (ICASSP '03), 2003 IEEE International Conference on, 3, III-21.
  29. Yen, S. J. and Lee, Y. S. (2009), Cluster-based under-sampling approaches for imbalanced data distributions, Expert Systems with Applications, 36(3), 5718-5727. https://doi.org/10.1016/j.eswa.2008.06.108
  30. Wu, G. and Chang, E. Y. (2003), Adaptive feature-space conformal transformation for imbalanced-data learning, In ICML, 816-823.