Skip to content

Conversation

rootfs
Copy link
Collaborator

@rootfs rootfs commented Oct 15, 2025

What type of PR is this?

Fine tune qwen3 0.6B for specialization. There are two ways: with MMLU as training dataset (i.e. leakage) for demo purpose, and use other dataset for training (no leakage).

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #239

Release Notes: Yes/No

Signed-off-by: Huamin Chen <hchen@redhat.com>
Copy link

netlify bot commented Oct 15, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit 01e9b4b
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/68f01e7b825c8a0008c5ad5e
😎 Deploy Preview https://deploy-preview-447--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link

github-actions bot commented Oct 15, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 src

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

  • src/training/training_lora/mmlu_pro_solver_lora/ft_qwen3_mmlu_solver_lora.py
  • src/training/training_lora/mmlu_pro_solver_lora/ft_qwen3_mmlu_solver_lora_no_leakage.py
  • src/training/training_lora/mmlu_pro_solver_lora/train_all_specialists.sh
  • src/training/training_lora/mmlu_pro_solver_lora/train_all_specialists_no_leakage.sh
  • src/training/training_lora/README.md
  • src/training/training_lora/classifier_model_fine_tuning_lora/ft_qwen3_generative_lora.py

📁 Root Directory

Owners: @rootfs, @Xunzhuo
Files changed:

  • examples/mcp-classifier-server/server_generative.py

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds scripts and implementations to fine-tune Qwen3 0.6B models for specialized knowledge domains using MMLU-Pro benchmarks. The key focus is on training domain-specific expert models (math, science, law, humanities, etc.) with and without data leakage scenarios to demonstrate proper evaluation methodology.

Key changes:

  • Added complete training pipeline for 6 specialized Qwen3 models
  • Implemented both "leakage" (demo) and "no-leakage" (proper) training approaches
  • Included batch training scripts with comprehensive logging and result tracking

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
train_all_specialists_no_leakage.sh Bash script for batch training 6 specialists using external datasets (no MMLU-Pro leakage)
train_all_specialists.sh Bash script for batch training 6 specialists using MMLU-Pro data (with leakage for demo)
ft_qwen3_mmlu_solver_lora_no_leakage.py Python training script using external datasets (GSM8K, MATH, ARC, etc.) for proper evaluation
ft_qwen3_mmlu_solver_lora.py Python training script using MMLU-Pro data directly (demo/leakage version)
README.md Updated documentation to include the new MMLU-Pro solver task

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.


TOTAL_MODELS=6
COMPLETED_MODELS=0
FAILED_MODELS=0
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FAILED_MODELS is initialized as a scalar (=0) but later used as both an array (FAILED_MODELS+=(...)) and incremented as a counter. This will cause incorrect behavior. Use separate variables for the counter and the array.

Suggested change
FAILED_MODELS=0
FAILED_MODELS_COUNT=0

Copilot uses AI. Check for mistakes.

@rootfs rootfs marked this pull request as draft October 15, 2025 18:48
Signed-off-by: Huamin Chen <hchen@redhat.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
@rootfs
Copy link
Collaborator Author

rootfs commented Oct 15, 2025

With the no_leakage training, the math accuracy is improved after fine tuning:

train_all_specialists_no_leakage.sh 2 100 5
2025-10-15 20:10:22 - common_lora_utils - INFO - IMPROVEMENT ANALYSIS (No Data Leakage)
2025-10-15 20:10:22 - common_lora_utils - INFO - 📊📊📊📊📊📊📊📊📊📊📊📊📊📊📊📊📊📊📊📊📊📊📊📊📊📊📊📊📊📊📊📊📊📊📊📊📊📊📊📊
2025-10-15 20:10:22 - common_lora_utils - INFO - 
================================================================================
2025-10-15 20:10:22 - common_lora_utils - INFO - OVERALL RESULTS:
2025-10-15 20:10:22 - common_lora_utils - INFO - ================================================================================
2025-10-15 20:10:22 - common_lora_utils - INFO -   Baseline (Untrained):     10.00%
2025-10-15 20:10:22 - common_lora_utils - INFO -   Post-training:            24.00%
2025-10-15 20:10:22 - common_lora_utils - INFO -   Absolute Improvement:     +14.00%
2025-10-15 20:10:22 - common_lora_utils - INFO -   Relative Improvement:     +140.0%

Signed-off-by: Huamin Chen <hchen@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create Specialized LLM Models for the Lightweight LLM Server for Integration Test

3 participants