INC ONNX Runtime 3.x API design #1532
Unanswered
mengniwang95
asked this question in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
INC ONNX Runtime 3.x API Design
Target
Main principles
autotuneis the exposed user interface API, which requires a set of configurations._quantizeis an internal API.GPTQConfigand autotune will use a set of configurations.Repo Architecture
autotuneare imported here.calculate_scale_zp.Previous Design
StaticQuant & SmoothQuant
Weight-only Quantization
New Design
StaticQuant & SmoothQuant
Configuration
The argument to config is data or a list of data. If the parameters can be assembled into different configurations, the returned obj will be a list of configurations used for autotuning.
Quantize Interface
Weight-only Quantization
Configuration
Quantize Interface
Beta Was this translation helpful? Give feedback.
All reactions