Efficient Finetuning for OpenAI GPT-OSS

Finetune OpenAI's open-source GPT-OSS models (gpt-oss-20b and gpt-oss-120b) up to 2.5× faster with our custom Triton-based flash attention kernels.
Supports long-context fully finetuning (up to 16K tokens) — all on a single node.

Why This Matters

Current flash attention libraries don't fully support GPT-OSS because the backward pass is missing. We provide custom Triton-based flash attention kernels that unlock faster, memory-efficient training, and optimized for GPT-OSS:

Key Benefits:

2.5× throughput gains
Lower memory usage
Efficient finetuning with long context

Detailed implementation is available in attention_kernel.py.

Benchmark Results

Training throughput improvements (tokens/sec per GPU):

Usage

Patch Hugging Face’s default GPT-OSS attention with our optimized kernels:

from transformers.models.gpt_oss import modeling_gpt_oss
from attention_kernel import triton_flash_attention

modeling_gpt_oss.eager_attention_forward = triton_flash_attention

Or simply use our helper:

import patching
patch_flash_attention()

Installation

Create a Python 3.11 virtual environment and install dependencies:

# Create and activate the virtual environment
uv venv gpt-oss --python 3.11
source gpt-oss/bin/activate

# Install dependencies
uv pip install --upgrade pip
uv pip install torch==2.8.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu128
uv pip install -r requirements.txt

Training

Full Finetuning and parameter-efficient (LoRA) finetuning.

Full Finetuning (multi-GPU)

accelerate launch --config_file configs/zero3.yaml sft.py --config configs/sft_full.yaml

LoRA Finetuning (single-GPU)

python sft.py --config configs/sft_lora.yaml

To customize training, modify the config files (sft_lora.yaml or sft_full.yaml) or override parameters directly from the command line.

Acknowledgements

This work builds upon the GPT-OSS Recipes codebase from Hugging Face: https://github.com/huggingface/gpt-oss-recipes

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
benchmarks		benchmarks
configs		configs
scripts		scripts
.gitignore		.gitignore
README.md		README.md
attention_kernel.py		attention_kernel.py
generate.py		generate.py
patching.py		patching.py
requirements.txt		requirements.txt
sft.py		sft.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Efficient Finetuning for OpenAI GPT-OSS

Why This Matters

Benchmark Results

Usage

Installation

Training

Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

nlp-uoregon/oregon_gpt_oss_patching

Folders and files

Latest commit

History

Repository files navigation

Efficient Finetuning for OpenAI GPT-OSS

Why This Matters

Benchmark Results

Usage

Installation

Training

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages