-
Notifications
You must be signed in to change notification settings - Fork 226
feat: add scripts to fine tune qwen3 for knowledge specialization #447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Huamin Chen <hchen@redhat.com>
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds scripts and implementations to fine-tune Qwen3 0.6B models for specialized knowledge domains using MMLU-Pro benchmarks. The key focus is on training domain-specific expert models (math, science, law, humanities, etc.) with and without data leakage scenarios to demonstrate proper evaluation methodology.
Key changes:
- Added complete training pipeline for 6 specialized Qwen3 models
- Implemented both "leakage" (demo) and "no-leakage" (proper) training approaches
- Included batch training scripts with comprehensive logging and result tracking
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
train_all_specialists_no_leakage.sh | Bash script for batch training 6 specialists using external datasets (no MMLU-Pro leakage) |
train_all_specialists.sh | Bash script for batch training 6 specialists using MMLU-Pro data (with leakage for demo) |
ft_qwen3_mmlu_solver_lora_no_leakage.py | Python training script using external datasets (GSM8K, MATH, ARC, etc.) for proper evaluation |
ft_qwen3_mmlu_solver_lora.py | Python training script using MMLU-Pro data directly (demo/leakage version) |
README.md | Updated documentation to include the new MMLU-Pro solver task |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
||
TOTAL_MODELS=6 | ||
COMPLETED_MODELS=0 | ||
FAILED_MODELS=0 |
Copilot
AI
Oct 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FAILED_MODELS is initialized as a scalar (=0) but later used as both an array (FAILED_MODELS+=(...)) and incremented as a counter. This will cause incorrect behavior. Use separate variables for the counter and the array.
FAILED_MODELS=0 | |
FAILED_MODELS_COUNT=0 |
Copilot uses AI. Check for mistakes.
With the no_leakage training, the math accuracy is improved after fine tuning: train_all_specialists_no_leakage.sh 2 100 5
|
Signed-off-by: Huamin Chen <hchen@redhat.com>
What type of PR is this?
Fine tune qwen3 0.6B for specialization. There are two ways: with MMLU as training dataset (i.e. leakage) for demo purpose, and use other dataset for training (no leakage).
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #239
Release Notes: Yes/No