Skip to content

2024-KBS-TeReKG A Temporal Collaborative Knowledge Graph Framework for Software Team Recommendation #295

@thangk

Description

@thangk

Link: https://doi.org/10.1016/j.knosys.2024.111492

Main problem

Previous automated software team selection methods suffer from three key limitations: (1) features directly calculated from team members are subjective to researchers' views and may not generalize across different datasets, (2) local features extracted from individual team members ignore global collaboration patterns within the entire software system, and (3) methods overlook the temporal aspect of open-source software development where collaboration patterns, technical skills, and project structures evolve over time. Traditional approaches like RECAST use manually crafted features that may overfit to specific datasets and fail to capture the dynamic nature of software development.

Proposed method

TeReKG (Software Team Recommendation Using Knowledge Graph Embedding) proposes a framework that constructs temporal collaborative knowledge graphs to model software development history. The approach involves: (1) Building heterogeneous knowledge graphs with six sub-graphs (User-Role, User-Expertise, User-Collaboration, Task-Dependency, Task-Locality, Task-Creation-Date) to capture relationships between tasks, projects, users, components, and temporal information, (2) Using knowledge graph embedding algorithms (TransE, DistMult, ComplEx, HolE, ConvKB, GNN) to learn global collaboration patterns automatically, (3) Employing link prediction protocol with collaboration-prioritization re-ranking to recommend candidates for specific roles, and (4) Using Max-Logit algorithm to enumerate optimal team configurations.

My Summary

TeReKG represents a paradigm shift from manually crafted local features to automatic global pattern extraction using knowledge graph embeddings for software team recommendation. The framework successfully addresses the subjectivity and locality issues of previous approaches while incorporating temporal dynamics of software development. Experimental results on three popular open-source projects (Moodle, Apache, Atlassian) demonstrate that TeReKG outperforms state-of-the-art baselines in both single-role and team recommendation tasks.

The evaluation shows that HolE knowledge graph embedding achieves the best performance across datasets, with TeReKG consistently outperforming RECAST and other baselines across multiple evaluation metrics including MRR, Hit@K, and MAP.

The key innovation lies in automatically learning global collaboration behaviors through knowledge graph embeddings rather than relying on researcher-defined features, while the temporal modeling captures the evolving nature of software development. This approach demonstrates the potential of knowledge graphs for capturing global patterns in software engineering applications.

Datasets

Moodle Dataset: 88,655 issues, 450 developers, 195 testers, 133 reviewers, 16 integrators
Data period: 2002/09/05 - 2019/05/22

Apache Dataset: 507,319 issues, 2,265 developers, 41 reviewers
Data period: 2002/04/03 - 2019/07/23

Atlassian Dataset: 238,322 issues, 39 developers, 21 testers, 127 reviewers
Data period: 2004/11/20 - 2019/03/29

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions