You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This factor is used in caching_allocator_warmup to determine how many bytes to pre-allocate for CUDA warmup.
283
+
- A factor of 2 means we pre-allocate the full memory footprint of the model.
284
+
- A factor of 4 means we pre-allocate half of that, and so on
285
+
286
+
However, when using TorchAO, calculating memory usage with param.numel() * param.element_size() doesn't give the correct size for quantized weights (like int4 or int8)
287
+
That's because TorchAO internally represents quantized tensors using subtensors and metadata, and the reported element_size() still corresponds to the torch_dtype
0 commit comments