-
Notifications
You must be signed in to change notification settings - Fork 17
Description
Link: https://doi.org/10.1016/j.knosys.2024.111492
Main problem
Previous automated software team selection methods suffer from three key limitations: (1) features directly calculated from team members are subjective to researchers' views and may not generalize across different datasets, (2) local features extracted from individual team members ignore global collaboration patterns within the entire software system, and (3) methods overlook the temporal aspect of open-source software development where collaboration patterns, technical skills, and project structures evolve over time. Traditional approaches like RECAST use manually crafted features that may overfit to specific datasets and fail to capture the dynamic nature of software development.
Proposed method
TeReKG (Software Team Recommendation Using Knowledge Graph Embedding) proposes a framework that constructs temporal collaborative knowledge graphs to model software development history. The approach involves: (1) Building heterogeneous knowledge graphs with six sub-graphs (User-Role, User-Expertise, User-Collaboration, Task-Dependency, Task-Locality, Task-Creation-Date) to capture relationships between tasks, projects, users, components, and temporal information, (2) Using knowledge graph embedding algorithms (TransE, DistMult, ComplEx, HolE, ConvKB, GNN) to learn global collaboration patterns automatically, (3) Employing link prediction protocol with collaboration-prioritization re-ranking to recommend candidates for specific roles, and (4) Using Max-Logit algorithm to enumerate optimal team configurations.
My Summary
TeReKG represents a paradigm shift from manually crafted local features to automatic global pattern extraction using knowledge graph embeddings for software team recommendation. The framework successfully addresses the subjectivity and locality issues of previous approaches while incorporating temporal dynamics of software development. Experimental results on three popular open-source projects (Moodle, Apache, Atlassian) demonstrate that TeReKG outperforms state-of-the-art baselines in both single-role and team recommendation tasks.
The evaluation shows that HolE knowledge graph embedding achieves the best performance across datasets, with TeReKG consistently outperforming RECAST and other baselines across multiple evaluation metrics including MRR, Hit@K, and MAP.
The key innovation lies in automatically learning global collaboration behaviors through knowledge graph embeddings rather than relying on researcher-defined features, while the temporal modeling captures the evolving nature of software development. This approach demonstrates the potential of knowledge graphs for capturing global patterns in software engineering applications.
Datasets
Moodle Dataset: 88,655 issues, 450 developers, 195 testers, 133 reviewers, 16 integrators
Data period: 2002/09/05 - 2019/05/22
Apache Dataset: 507,319 issues, 2,265 developers, 41 reviewers
Data period: 2002/04/03 - 2019/07/23
Atlassian Dataset: 238,322 issues, 39 developers, 21 testers, 127 reviewers
Data period: 2004/11/20 - 2019/03/29