This is the replication package of the Systematic Literature Review: Categorizing software repositories: Are we there yet?
The file DATA EXTRACTION.xlsx contains the data extracted from all the papers reviewed in our survey. Each row corresponds to information about various aspects of a specific paper.
- The RQ1 directory contains data related to RQ1, including charts and tables for the three sub-RQs. The tables provide specific information on three aspects of the data (i.e., data source, data type, and classification principles) for each paper.
- The RQ2 directory contains data related to RQ2, including charts and tables for the three sub-RQs. The tables provide specific information on three aspects of features (i.e., preprocessing methods, feature representation, and feature combination) for each paper.
- The RQ3 directory contains data related to RQ3, including charts and tables for the three sub-RQs. The tables provide specific information on three aspects of classification methods (i.e., classification methods, hyperparameter tuning methods, and the architecture of PTM-based categorization methods) for each paper.
- The RQ4 directory contains data related to RQ4, including charts and tables for the three sub-RQs. The tables provide specific information on three aspects of performance evaluation (i.e., validation methods, statistical testing, and evaluation metrics) for each paper.