A Python tool using machine learning to rank repository files by bug likelihood. It integrates Learning-to-Rank techniques for efficient bug prediction and code quality enhancement.
- Data Extraction:
Retrieve commit data from GitHub using the GitHub API. - Data Preprocessing:
Comprehensive cleaning, normalization, and feature engineering on commit data. - Learning-to-Rank Implementation:
Utilizes LightGBM to rank source code files based on bug likelihood. - Model Evaluation:
Evaluated using metrics such as NDCG score, precision, recall, accuracy, and F1 score.
For an in-depth look at the methodology, experimental evaluation, and outcomes, please refer to the following documents:
- Research Paper: Ranking Source Code for Bug Prediction: An L2R Approach on An Open Source Repository
- Presentation: Bug Prediction Using Learning-to-Rank
- Python 3.x
- Pandas
- LightGBM
- Matplotlib
- Seaborn
- Scikit-learn
- Clone the Repository:
git clone https://github.com/gillemta/bug-likelihood-ranker.git- Navigate to the Project Directory and Install Dependencies:
cd bug-likelihood-ranker
pip install -r requirements.txt- Insert Your GitHub API Bearer Token: In the main script, set your bearer token:
GITHUB_TOKEN = "<insert-token-here>"- Configure Data Retrieval:
Modify the
max_pagesvariable to specify the number of pages of commit data to pull (default is 50):
max_pages = 50- Run the Main Program:
python main.py- Log in to GitHub: Sign in to your GitHub account.
- Access Token Settings: Click your profile picture in the top-right corner, then select
Settings. - Developer Settings: Scroll down and click on
Developer Settings. - Personal Access Tokens: Select
Personal access tokens. - Generate a New Token:
Click
Generate new token, provide a descriptive name (e.g., "Bug Likelihood Ranker Token"), and select the required scopes (typically thereposcope). - Generate and Copy the Token:
Click
Generate tokenand immediately copy the token, as it will not be shown again. - Insert the Token: Paste the token into your program where indicated.
If you receive an error such as:
Error 401: {"message":"Bad credentials","documentation_url":"https://docs.github.com/rest"}Double-check your token and ensure it’s correctly inserted. For more detailed guidance, refer to GitHub's documentation on personal access tokens.
Below is an example of the file that will be output after the program is ran
