Skip to content

Commit 36ec29e

Browse files
Fix SFT example (8bit quant, trl) (#2857)
This PR fixes a few issues with the examples/sft example. - There was an error in argument parsing due to trl renaming an argument to max_length, which was now conflicting with another argument name already in use. - If a user wanted to choose 8bit bnb quantization, they also had to pass use_4bit_quantization=True as an argument due to a wrong indentiation. - Documented that 8bit quantization does not work with FSDP (the aforementioned bug may have masked this)
1 parent 77daa8d commit 36ec29e

File tree

11 files changed

+21
-20
lines changed

11 files changed

+21
-20
lines changed

examples/sft/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ In this example, we'll see how to use [PEFT](https://github.com/huggingface/peft
55
QLoRA uses 4-bit quantization of the base model to drastically reduce the GPU memory consumed by the base model while using LoRA for parameter-efficient fine-tuning. The command to use QLoRA is present at [run_peft.sh](https://github.com/huggingface/peft/blob/main/examples/sft/run_peft.sh).
66

77
Note:
8-
1. At present, `use_reentrant` needs to be `True` when using gradient checkpointing with QLoRA else QLoRA leads to high GPU memory consumption.
8+
1. At present, `use_reentrant` needs to be `True` when using gradient checkpointing with QLoRA or else QLoRA leads to high GPU memory consumption.
99

1010

1111
## Single GPU SFT with QLoRA using Unsloth
@@ -29,6 +29,8 @@ When you have access to multiple GPUs, it would be better to use normal LoRA wit
2929
## Multi-GPU SFT with LoRA and FSDP
3030
When you have access to multiple GPUs, it would be better to use normal LoRA with DeepSpeed/FSDP. To use LoRA with FSDP, refer to the docs at [PEFT with FSDP](https://huggingface.co/docs/peft/accelerate/fsdp).
3131

32+
Note: FSDP is currently not compatible with 8bit bitsandbytes quantization.
33+
3234

3335
## Multi-GPU SFT with LoRA and FSDP for GPTQModel:
3436
As in [Multi-GPU SFT with LoRA and FSDP](https://github.com/huggingface/peft/blob/main/examples/sft/README.md#multi-gpu-sft-with-lora-and-fsdp), we also support other quantization methods like GPTQModel. You may need to install [GPTQModel](https://github.com/ModelCloud/GPTQModel) > v3.0.0 or from source. Here is the launch command for reference: [run_peft_fsdp_gptq.sh]. For the `--model_name_or_path` argument, it is important to pass a model that is already quantized with GPTQModel, like `"hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4"`.

examples/sft/run_peft.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ python train.py \
66
--add_special_tokens False \
77
--append_concat_token False \
88
--splits "train,test" \
9-
--max_seq_len 2048 \
9+
--max_length 2048 \
1010
--num_train_epochs 1 \
1111
--logging_steps 5 \
1212
--log_level "info" \

examples/sft/run_peft_deepspeed.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ accelerate launch --config_file "configs/deepspeed_config.yaml" train.py \
66
--add_special_tokens False \
77
--append_concat_token False \
88
--splits "train,test" \
9-
--max_seq_len 2048 \
9+
--max_length 2048 \
1010
--num_train_epochs 1 \
1111
--logging_steps 5 \
1212
--log_level "info" \
@@ -36,4 +36,4 @@ accelerate launch --config_file "configs/deepspeed_config.yaml" train.py \
3636
--lora_alpha 16 \
3737
--lora_dropout 0.1 \
3838
--lora_target_modules "all-linear" \
39-
--use_4bit_quantization False
39+
--use_4bit_quantization False

examples/sft/run_peft_fsdp.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ accelerate launch --config_file "configs/fsdp_config.yaml" train.py \
66
--add_special_tokens False \
77
--append_concat_token False \
88
--splits "train,test" \
9-
--max_seq_len 2048 \
9+
--max_length 2048 \
1010
--num_train_epochs 1 \
1111
--logging_steps 5 \
1212
--log_level "info" \

examples/sft/run_peft_fsdp_gptq.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ accelerate launch --config_file "configs/fsdp_config.yaml" train.py \
66
--add_special_tokens False \
77
--append_concat_token False \
88
--splits "train,test" \
9-
--max_seq_len 2048 \
9+
--max_length 2048 \
1010
--num_train_epochs 1 \
1111
--logging_steps 5 \
1212
--log_level "info" \

examples/sft/run_peft_multigpu.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ torchrun --nproc_per_node 8 --nnodes 1 train.py \
66
--add_special_tokens False \
77
--append_concat_token False \
88
--splits "train,test" \
9-
--max_seq_len 2048 \
9+
--max_length 2048 \
1010
--num_train_epochs 1 \
1111
--logging_steps 5 \
1212
--log_level "info" \

examples/sft/run_peft_qlora_deepspeed_stage3.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ accelerate launch --config_file "configs/deepspeed_config_z3_qlora.yaml" train.
66
--add_special_tokens False \
77
--append_concat_token False \
88
--splits "train,test" \
9-
--max_seq_len 2048 \
9+
--max_length 2048 \
1010
--num_train_epochs 1 \
1111
--logging_steps 5 \
1212
--log_level "info" \
@@ -39,4 +39,4 @@ accelerate launch --config_file "configs/deepspeed_config_z3_qlora.yaml" train.
3939
--use_4bit_quantization True \
4040
--use_nested_quant True \
4141
--bnb_4bit_compute_dtype "bfloat16" \
42-
--bnb_4bit_quant_storage_dtype "bfloat16"
42+
--bnb_4bit_quant_storage_dtype "bfloat16"

examples/sft/run_peft_qlora_fsdp.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ accelerate launch --config_file "configs/fsdp_config_qlora.yaml" train.py \
66
--add_special_tokens False \
77
--append_concat_token False \
88
--splits "train,test" \
9-
--max_seq_len 2048 \
9+
--max_length 2048 \
1010
--num_train_epochs 1 \
1111
--logging_steps 5 \
1212
--log_level "info" \

examples/sft/run_unsloth_peft.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ python train.py \
66
--add_special_tokens False \
77
--append_concat_token False \
88
--splits "train,test" \
9-
--max_seq_len 2048 \
9+
--max_length 2048 \
1010
--num_train_epochs 1 \
1111
--logging_steps 5 \
1212
--log_level "info" \

examples/sft/train.py

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,6 @@ class ModelArguments:
1818
model_name_or_path: str = field(
1919
metadata={"help": "Path to pretrained model or model identifier from huggingface.co/models"}
2020
)
21-
max_seq_length: Optional[int] = field(
22-
default=512,
23-
metadata={"help": "The maximum total input sequence length after tokenization."},
24-
)
2521
chat_template_format: Optional[str] = field(
2622
default="none",
2723
metadata={
@@ -156,4 +152,5 @@ def main(model_args, data_args, training_args):
156152
model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))
157153
else:
158154
model_args, data_args, training_args = parser.parse_args_into_dataclasses()
155+
model_args.max_length = training_args.max_length
159156
main(model_args, data_args, training_args)

0 commit comments

Comments
 (0)