This repository contains a Python script to convert Hugging Face models to ONNX and ORT formats, perform quantization, and generate README files for the converted models. The script automates the process of optimizing models for deployment, making it easier to use models in different environments.
- Model Conversion: Convert Hugging Face models to ONNX and ORT formats.
- Model Optimization: Optimize the ONNX models for better performance.
- Quantization: Perform FP16, INT8, and UINT8 quantization on the models.
- README Generation: Automatically generate English and Japanese README files for the converted models.
- Hugging Face Integration: Optionally upload the converted models to Hugging Face Hub.
-
Python 3.11 or higher
-
Install required packages using
requirements.txt
:pip install -r requirements.txt
Alternatively, you can install the packages individually:
pip install torch transformers onnx onnxruntime onnxconverter-common onnxruntime-tools onnxruntime-transformers huggingface_hub
-
Clone the Repository
git clone https://github.com/yourusername/model_conversion.git cd model_conversion
-
Install Dependencies
Ensure that you have Python 3.11 or higher installed. Install the required packages using
requirements.txt
:pip install -r requirements.txt
-
Run the Conversion Script
The script
convert_model.py
converts and quantizes the model.python convert_model.py --model your-model-name --output_dir output_directory
- Replace
your-model-name
with the name or path of the Hugging Face model you want to convert. - The
--output_dir
argument specifies the output directory. If not provided, it defaults to the model name.
Example:
python convert_model.py --model bert-base-uncased --output_dir bert_onnx
- Replace
-
Upload to Hugging Face (Optional)
To upload the converted models to Hugging Face Hub, add the
--upload
flag.python convert_model.py --model your-model-name --output_dir output_directory --upload
Make sure you are logged in to Hugging Face CLI:
huggingface-cli login
After running the conversion script, you can use the converted models as shown below:
import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer
import os
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained('your-model-name')
# Prepare inputs
text = 'Replace this text with your input.'
inputs = tokenizer(text, return_tensors='np')
# Specify the model paths
# Test both the ONNX model and the ORT model
model_paths = [
'onnx_models/model_opt.onnx', # ONNX model
'ort_models/model.ort' # ORT format model
]
# Run inference with each model
for model_path in model_paths:
print(f'\n===== Using model: {model_path} =====')
# Get the model extension
model_extension = os.path.splitext(model_path)[1]
# Load the model
if model_extension == '.ort':
# Load the ORT format model
session = ort.InferenceSession(model_path, providers=['CPUExecutionProvider'])
else:
# Load the ONNX model
session = ort.InferenceSession(model_path)
# Run inference
outputs = session.run(None, dict(inputs))
# Display the output shapes
for idx, output in enumerate(outputs):
print(f'Output {idx} shape: {output.shape}')
# Display the results (add further processing if needed)
print(outputs)
- Ensure that your ONNX Runtime version is 1.15.0 or higher to use ORT format models.
- Adjust the
providers
parameter based on your hardware (e.g.,'CUDAExecutionProvider'
for GPUs).
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
Contributions are welcome! Please open an issue or submit a pull request for improvements.