Skip to content

A Python tool employing a LightGBM-based Learning-to-Rank approach to predict and rank source code files by their bug likelihood.

Notifications You must be signed in to change notification settings

gillemta/bug-likelihood-ranker

Repository files navigation

Bug Prediction Using Learning to Rank (L2R)

A Python tool using machine learning to rank repository files by bug likelihood. It integrates Learning-to-Rank techniques for efficient bug prediction and code quality enhancement.


Features

  • Data Extraction:
    Retrieve commit data from GitHub using the GitHub API.
  • Data Preprocessing:
    Comprehensive cleaning, normalization, and feature engineering on commit data.
  • Learning-to-Rank Implementation:
    Utilizes LightGBM to rank source code files based on bug likelihood.
  • Model Evaluation:
    Evaluated using metrics such as NDCG score, precision, recall, accuracy, and F1 score.

Research Artifacts

For an in-depth look at the methodology, experimental evaluation, and outcomes, please refer to the following documents:


Getting Started

Prerequisites

  • Python 3.x
  • Pandas
  • LightGBM
  • Matplotlib
  • Seaborn
  • Scikit-learn

Installing

  1. Clone the Repository:
git clone https://github.com/gillemta/bug-likelihood-ranker.git
  1. Navigate to the Project Directory and Install Dependencies:
cd bug-likelihood-ranker
pip install -r requirements.txt

Usage

  1. Insert Your GitHub API Bearer Token: In the main script, set your bearer token:
GITHUB_TOKEN = "<insert-token-here>"
  1. Configure Data Retrieval: Modify the max_pages variable to specify the number of pages of commit data to pull (default is 50):
max_pages = 50
  1. Run the Main Program:
python main.py

Bearer Token Setup

  1. Log in to GitHub: Sign in to your GitHub account.
  2. Access Token Settings: Click your profile picture in the top-right corner, then select Settings.
  3. Developer Settings: Scroll down and click on Developer Settings.
  4. Personal Access Tokens: Select Personal access tokens.
  5. Generate a New Token: Click Generate new token, provide a descriptive name (e.g., "Bug Likelihood Ranker Token"), and select the required scopes (typically the repo scope).
  6. Generate and Copy the Token: Click Generate token and immediately copy the token, as it will not be shown again.
  7. Insert the Token: Paste the token into your program where indicated.

Troubleshooting

If you receive an error such as:

Error 401: {"message":"Bad credentials","documentation_url":"https://docs.github.com/rest"}

Double-check your token and ensure it’s correctly inserted. For more detailed guidance, refer to GitHub's documentation on personal access tokens.


Screenshots / Demo

Below is an example of the file that will be output after the program is ran

Ranking File

About

A Python tool employing a LightGBM-based Learning-to-Rank approach to predict and rank source code files by their bug likelihood.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages