DOI QR코드

DOI QR Code

Ambiguity Analysis of Defectiveness in NASA MDP Data Sets

NASA MDP 데이터 집합의 결함도 모호성 분석

  • Received : 2013.01.27
  • Accepted : 2013.06.06
  • Published : 2013.06.30

Abstract

Public domain defect data sets, such as NASA data sets which are available from the NASA MDP and PROMISE repositories, make it possible to compare the results of different defect prediction models by using the same data sets. This means that repeatable and general prediction models can be built. However, some recent studies have raised questions about the quality of two versions of NASA data set, and made new cleaned data sets by applying their data cleaning processes. We find that there are two ways in the NASA MDP versions to determine the defectiveness of a module, 0 or 1, and the two results are different in some cases. This serious problem, to our knowledge, has not been addressed in previous studies. To handle this ambiguity problem, we define two kinds of module defectiveness and two conditions that can be used to determine the ambiguous cases. We meticulously analyze 5 projects among the 13 NASA projects by using our ambiguity analysis method. The results show that JM1 and PC4 are the best projects with few ambiguous cases.

Keywords

References

  1. 홍의석, "훈련데이터 집합을 사용하지 않는 소프트웨어 품질예측 모델", 정보처리학회논문지, 제10-D권, 제4호(2003), pp.689-696. https://doi.org/10.3745/KIPSTD.2003.10D.4.689
  2. Catal, C. and B. Diri, "A systematic review of software fault prediction studies", Expert Systems with Applications, Vol.36, No.4(2009), pp.7346-7354. https://doi.org/10.1016/j.eswa.2008.10.027
  3. Catal, C., "Software fault prediction : A literature review and current trends", Expert Systems with Applications, Vol.38, No.4(2011), pp.4626-4636. https://doi.org/10.1016/j.eswa.2010.10.024
  4. Elish, K. O. and M. O. Elish, "Predicting defect prone software modules using support vector machines", J. Systems Software, Vol. 81, No.5(2008), pp.649-660. https://doi.org/10.1016/j.jss.2007.07.040
  5. Gray, D., D. Bowes, N. Davey, Y. Sun, and B. Christianson, "Reflections on the NASA MDP data sets", IET Software, Vol.6, No.6 (2012), pp.549-558. https://doi.org/10.1049/iet-sen.2011.0132
  6. Hall, T., S. Beecham, D. Bowes, D. Gray and S. Counsell, "A Systematic Literature Review on Fault Prediction Performance in Software Engineering", IEEE Trans. Software Engineering, Vol.38, No.6(2012), pp.1276-1304. https://doi.org/10.1109/TSE.2011.103
  7. Khoshgoftaar, T. M. and E. B. Allen, "Ordering fault-prone software modules", Software Quality Journal, Vol.11, No.1(2003), pp. 19-37. https://doi.org/10.1023/A:1023632027907
  8. Lessmann, S., B. Baesens, C. Mues, and S. Pietsch, "Benchmarking Classification Models for Software Defect Prediction : A Proposed Framework and Novel Findings", IEEE Trans. Software Engineering, Vol.34, No.4(2008), pp.485-496. https://doi.org/10.1109/TSE.2008.35
  9. Menzies, T., J. Greenwald, and A. Frank, "Data mining static code attributes to learn defect predictors", IEEE Trans. Software Engineering, Vol.33, No.1(2007), pp.2-13. https://doi.org/10.1109/TSE.2007.256941
  10. Seliya, N. and T. M. Khoshgoftaar, "Software quality analysis of unlabeled program modules with semisupervised clustering", IEEE Trans. Systems, Man and Cybernetics, Vol.37, No.2(2007), pp.201-211. https://doi.org/10.1109/TSMCA.2006.889473
  11. Shepperd, M., Q. Song, Z. Sun, and C. Mair, "Data Quality : Some Comments on the NASA Software Defect Data Sets", http://nasa-softwaredefectdatasets.wikispaces.com /file/view/NASA+defect+data+sets+paper. pdf.
  12. Song, Q., Z. Jia, M. Shepperd, S. Ying, and J. Liu, "A General Software Defect-Proneness Prediction Framework", IEEE Trans. Software Engineering, Vol.37, No.3(2011), pp.356-370. https://doi.org/10.1109/TSE.2010.90
  13. Zhong, S., T. M. Khoshgoftaar, and N. Seliya, "Analyzing Software Measurement Data with Clustering Techniques", IEEE Intelligent Systems, Vol.19, No.2(2004), pp.20-27.
  14. Zhou, Y. and H. Leung, "Empirical Analysis of Object-Oriented Design Metrics for Predicting High and Low Severity Faults", IEEE Trans. Software Engineering, Vol.32, No.10(2006), pp.771-789. https://doi.org/10.1109/TSE.2006.102