DOI QR코드

DOI QR Code

Unsupervised Learning Model for Fault Prediction Using Representative Clustering Algorithms

대표적인 클러스터링 알고리즘을 사용한 비감독형 결함 예측 모델

  • Received : 2013.11.25
  • Accepted : 2014.01.26
  • Published : 2014.02.28

Abstract

Most previous studies of software fault prediction model which determines the fault-proneness of input modules have focused on supervised learning model using training data set. However, Unsupervised learning model is needed in case supervised learning model cannot be applied: either past training data set is not present or even though there exists data set, current project type is changed. Building an unsupervised learning model is extremely difficult that is why only a few studies exist. In this paper, we build unsupervised models using representative clustering algorithms, EM and DBSCAN, that have not been used in prior studies and compare these models with the previous model using K-means algorithm. The results of our study show that the EM model performs slightly better than the K-means model in terms of error rate and these two models significantly outperform the DBSCAN model.

입력 모듈의 결함경향성을 결정하는 결함 예측 모델 연구들은 대부분 훈련 데이터 집합을 사용하는 감독형 모델에 관련된 것들이었다. 하지만 과거 데이터 집합이 없거나 데이터 집합이 있더라도 현재 프로젝트와 성격이 다른 경우는 비감독형 모델이 필요하며, 이들에 관한 연구들은 모델 구축의 어려움 때문에 극소수 존재한다. 본 논문에서는 기존 비감독형 모델 연구들에서 사용하지 않은 대표적인 클러스터링 알고리즘인 EM, DBSCAN을 사용한 비감독형 모델들을 제작하여, 기존 연구들에서 사용한 K-means 모델과 성능을 비교하였다. 그 결과 오류율 면에서 EM이 K-means보다 약간 나은 성능을 보였으며, DBSCAN은 두 모델에 떨어지는 성능을 보였다.

Keywords

References

  1. C. Catal, "Software fault prediction: A literature review and current trends," Expert Systems with Applications, Vol.38, No.4, pp.4626-4636, 2011. https://doi.org/10.1016/j.eswa.2010.10.024
  2. T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell, "A Systematic Literature Review on Fault Prediction Performance in Software Engineering," IEEE Trans. Software Engineering, Vol.38, No.6, pp.1276-1304, 2012. https://doi.org/10.1109/TSE.2011.103
  3. T. Y. Kim, Y. K. Kim, and H. S. Chae, "An Experimental Study of Generality of Sogtware Defects Prediction Models based on Object Oriented Metrics," The KIPS Transactions: Part D, Vol.16, No.3, pp.407-416, 2009.
  4. S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, "Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings," IEEE Trans. Software Engineering, Vol.34, No.4, pp. 485-496, 2008. https://doi.org/10.1109/TSE.2008.35
  5. Q. Song, Z. Jia, M. Shepperd, S. Ying and J. Liu, "A General Software Defect-Proneness Prediction Framework," IEEE Trans. Software Engineering, Vol.37, No.3, pp.356-370, 2011. https://doi.org/10.1109/TSE.2010.90
  6. S. Zhong, T. M. Khoshgoftaar, and N. Seliya, "Analyzing Software Measurement Data with Clustering Techniques," IEEE Intelligent Systems, Vol.19, No.2 pp.20-27, 2004.
  7. S. Zhong, T. M. Khoshgoftaar, and N. Seliya, "Unsupervised learning for expert-based software quality estimation," Proc. of HASE, 2004.
  8. E. S. Hong, "A software quality prediction model without training data set," The KIPS Transactions: Part D, Vol.10, No.4, pp.689-696, 2003. https://doi.org/10.3745/KIPSTD.2003.10D.4.689
  9. C. Catal, U. Sevim, and B. Diri, "Software fault prediction of unlabeled program modules," Proc. of WCE, 2009.
  10. P. S. Bishnu and V. Bhattacherjee, "Software fault prediction using quad tree-based k-means clustering algorithm," IEEE Trans. Knowledge and Data Eng., Vol.24, No.6, pp.1146-1150, 2012. https://doi.org/10.1109/TKDE.2011.163
  11. N. Seliya and T. M. Khoshgoftaar, "Software quality analysis of unlabeled program modules with semisupervised clustering," IEEE Trans. Systems, Man and Cybernetics, Vol.37, No.2, pp.201-211, 2007. https://doi.org/10.1109/TSMCA.2006.889473
  12. N. Seliya and T. M. Khoshgoftaar, "Software quality estimation with limited fault data: A semi supervised learning perspective," Software Quality Journal, Vol.15, No.3, pp.327-344, 2007. https://doi.org/10.1007/s11219-007-9013-8
  13. C. Catal and B. Diri, "Unlabeled Extra Data do not Always Mean Extra Performance for Semi-Supervised Fault Prediction," Expert Systems, Vol.26, No.5, pp.458-471, 2009. https://doi.org/10.1111/j.1468-0394.2009.00509.x
  14. A. K. Jain and R. C. Dubes, Algorithms for clustering data, Prentice Hall, 1988.
  15. D. Gray, D. Bowes, N. Davey, Y. Sun, and B. Christianson, "Reflections on the NASA MDP data sets," IET Software, Vol.6, No.6, pp.549-558, 2012. https://doi.org/10.1049/iet-sen.2011.0132
  16. E. S. Hong, "Ambiguity Analysis of Defectiveness in NASA MDP data sets," Journal of the Korea Society of IT Services, Vol.12, No.2, pp.361-371, 2013. https://doi.org/10.9716/KITS.2013.12.2.361

Cited by

  1. Software Quality Prediction based on Defect Severity vol.20, pp.5, 2015, https://doi.org/10.9708/jksci.2015.20.5.073
  2. Severity-based Software Quality Prediction using Class Imbalanced Data vol.21, pp.4, 2016, https://doi.org/10.9708/jksci.2016.21.4.073