Compilation Instance Transfer and Feature-representation Transfer for Cross Project Defect Prediction

Authors

Engineering Faculty, Computer Engineering Group, Yazd University, Yazd, Iran

Abstract

Software defect prediction is critical for software quality improvement. So that, limited resources for software testing is allocated only to fault-prone instead of all software modules. In project defect prediction, to build a prediction model, usually local labeled dataset are used. But, building the predicting model for projects without local labeled data is almost impossible. Thus, cross project defect perdition is proposed for training the prediction model with data from other projects. In cross project defect perdition, training data and test data distribution are different. Therefore, researches in this area have focused for reduction the negative impact of different distribution between training and test data. In this research, the Knowledge Estimation Interval (KEI) method is proposed. In this method, instances of training data by similar distribution with test set are selected. Then, selected instances are given as training to the prediction model. To increase the effectiveness of the proposed approach, feature extraction techniques are applied on training and test set before KEI. The evaluation results of the proposed approach on 10 datasets from NASA and SoftLab with AUC indicate the effectiveness of this approach to predict the fault-prone modules. The proposed method has increased an average value of 38.1% in the accuracy compared to within project defect prediction models.

Keywords