Commit c8b9aca
ssjia
Update on "[ET-VK][ez][qconv] Add auto-selection to prefer im2col for q8ta_conv2d"
The q8ta_conv2d operator previously always delegated to the general (sliding window) implementation, even though the im2col implementation is 2-5x faster for non-grouped convolutions with in_channels % 4 == 0. This change adds runtime auto-selection logic that checks the groups parameter and input channel alignment, then dispatches to q8ta_conv2d_im2col when its constraints are met. On ResNet50 int8, this reduces Vulkan inference latency from 14.2ms to 6.8ms (2.1x speedup) on Samsung Galaxy S24, making it 30% faster than XNNPACK (9.7ms). Also adds performance test cases for deep-channel small-spatial scenarios (512ch 7x7, 1024→2048ch 1x1 stride-2) that stress-test the optimization.
Differential Revision: [D93768637](https://our.internmc.facebook.com/intern/diff/D93768637/)
[ghstack-poisoned]File tree
3 files changed
+6
-14
lines changed- backends/vulkan/runtime/graph/ops/glsl
- third-party
3 files changed
+6
-14
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
384 | 384 | | |
385 | 385 | | |
386 | 386 | | |
387 | | - | |
| 387 | + | |
388 | 388 | | |
389 | 389 | | |
390 | 390 | | |
| |||
Lines changed: 5 additions & 11 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
106 | 106 | | |
107 | 107 | | |
108 | 108 | | |
109 | | - | |
110 | | - | |
111 | | - | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
112 | 112 | | |
113 | | - | |
114 | | - | |
| 113 | + | |
| 114 | + | |
115 | 115 | | |
116 | 116 | | |
117 | | - | |
118 | | - | |
119 | | - | |
120 | | - | |
121 | | - | |
122 | | - | |
123 | 117 | | |
124 | 118 | | |
125 | 119 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | | - | |
16 | 15 | | |
17 | 16 | | |
18 | 17 | | |
| |||
21 | 20 | | |
22 | 21 | | |
23 | 22 | | |
24 | | - | |
25 | 23 | | |
26 | 24 | | |
0 commit comments