File tree Expand file tree Collapse file tree 1 file changed +6
-4
lines changed
examples/pytorch/nlp/huggingface_models/language-modeling/quantization/mix-precision Expand file tree Collapse file tree 1 file changed +6
-4
lines changed Original file line number Diff line number Diff line change @@ -27,7 +27,7 @@ pip install -r requirements.txt
2727### Demo (` MXFP4 ` , ` MXFP8 ` , ` NVFP4 ` , ` uNVFP4 ` )
2828
2929``` bash
30- python quantize.py --model_name_or_path facebook/opt-125m --quantize --dtype MXFP4 --batch_size 8 --accuracy
30+ python quantize.py --model_name_or_path facebook/opt-125m --quantize --dtype MXFP4 --batch_size 8 --accuracy --enable_torch_compile
3131```
3232
3333### Mix-precision Quantization (` MXFP4 + MXFP8 ` )
@@ -41,7 +41,8 @@ python quantize.py \
4141 --use_recipe \
4242 --recipe_file recipes/Meta-Llama-3.1-8B-Instruct_7bits.json \
4343 --accuracy \
44- --batch_size 32
44+ --batch_size 32 \
45+ --enable_torch_compile
4546
4647# Llama 3.3 70B
4748deepspeed --include=" localhost:0,1,2,3" --master_port=29500 quantize.py \
@@ -112,13 +113,14 @@ Model with mixed precision is not supported in vLLM, but supported in transforme
112113python quantize.py \
113114 --model_name_or_path meta-llama/Llama-3.1-8B-Instruct \
114115 --quantize \
115- --iters 0 \
116116 --dtype MXFP4 \
117117 --use_recipe \
118118 --recipe_file recipes/Meta-Llama-3.1-8B-Instruct_7bits.json \
119119 --save \
120120 --save_format auto_round \
121- --save_path Llama-3.1-8B-Instruct-MXFP4-MXFP8-AR
121+ --save_path Llama-3.1-8B-Instruct-MXFP4-MXFP8-AR \
122+ --enable_torch_compile
123+
122124# Command to inference with transformer:
123125python run_hf_inf.py Llama-3.1-8B-Instruct-MXFP4-MXFP8-AR
124126```
You can’t perform that action at this time.
0 commit comments