Automatically cluster your starred GitHub repositories using semantic similarity analysis. This tool intelligently tracks your starred repos, analyzes their content, and groups them into meaningful clusters with incremental updates to preserve your established organization.
Incremental Pipeline Architecture:
- Repository Tracking (
fetch_starred.py
) - Tracks starred repositories and identifies new/updated/removed repos - Data Collection (
collect_data.py
) - Incrementally fetches detailed metadata only for changed repositories - Smart Clustering (
cluster_repos.py
) - Uses semantic similarity with three modes:- Auto: Assigns new repos to existing clusters, reclusters if major changes
- Assign: Only assigns new repos to existing clusters (preserves organization)
- Full: Complete reclustering from scratch
Add this action to your workflow:
- name: Cluster starred repositories
uses: gojiplus/starclass@main
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
cluster-mode: 'auto' # or 'assign' or 'full'
This repository includes a demo workflow. Fork it and:
- Set repository secrets:
GH_PAT
: Personal access token withuser
scope
- Go to Actions tab → "Demo StarClass Action" → "Run workflow"
# Step 1: Track starred repositories
GH_TOKEN=<your-token> GH_USER=<your-username> python scripts/fetch_starred.py
# Step 2: Collect detailed data (incremental)
GH_TOKEN=<your-token> GH_USER=<your-username> python scripts/collect_data.py
# Step 3: Generate clusters
CLUSTER_MODE=auto python scripts/cluster_repos.py
- auto (default): Smart mode that assigns new repos to existing clusters, but reclusters everything if there are major changes
- assign: Only assigns new repositories to existing clusters, preserving your current organization
- full: Performs complete reclustering of all repositories from scratch
starred_repos_list.json
: Tracked starred repositories with metadatastarred_repos_changes.json
: Summary of new/updated/removed repositoriesstarred_repos_data.json
: Detailed repository data (cached)starred_repos_clusters.json
: Clustered repositories by topicstarred_repos_clusters.md
: Human-readable clustered outputclustering_model.joblib
: Saved clustering model for incremental updates