This repository contains six public datasets related to our research on predicting developer expertise within the realm of serverless functions. You can find the article at the following URL: https://cke.um.ac.ir/article_45050.html. These datasets were created using features extracted from GitHub and Stack Overflow, and they represent our target languages.
Our research ventured into the domain of predicting developer expertise specifically within the realm of serverless functions. We found that integrating data from multiple platforms like GitHub and Stack Overflow provides an in-depth understanding of developer expertise. One of the tangible outputs of our research is the creation and public release of these six language-specific datasets. By making these datasets publicly available, we aim to contribute to the academic community and foster further research in this area.
Each dataset is provided in CSV format with the following columns:
number_of_commitscommits_client_filescommits_import_librarycode_churncode_churn_client_filesimportsdays_since_first_importdays_since_last_importdays_between_importsavg_days_commits_client_filesavg_days_commits_import_libraryprojectsprojects_importNum_AnswersNum_Accepted_AnswersNum_UpvotesAvg_Score_Per_AnswerNum_QuestionsNum_First_AnswersTag_Scoretime_of_activityrating: This is our label which came from a survey (sent by email) that asked developers to rate themselves.
We hope these datasets will be useful for your research. If you use them, please consider citing our paper.