You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We just need to focus on the reading and writing performed on array1. In `union_vector16`, both vectorized and scalar code still obey the basic rule: read from two inputs, do the union, and then write the output.
213
+
214
+
Let's say the length(cardinality) of input2 is L2:
215
+
```
216
+
|<- L2 ->|
217
+
array1: [output--- |input 1---|---]
218
+
array2: [input 2---]
219
+
```
220
+
Let's define 3 __m128i pointers, `pos1` starts from `input1`, `pos2` starts from `input2`, these 2 point at the next byte to read, `out` starts from `output`, pointing at the next byte to overwrite.
221
+
```
222
+
array1: [output--- |input 1---|---]
223
+
^ ^
224
+
out pos1
225
+
array2: [input 2---]
226
+
^
227
+
pos2
228
+
```
229
+
The union output always contains less or equal number of elements than all inputs added, so we have:
230
+
```
231
+
out <= pos1 + pos2
232
+
```
233
+
therefore:
234
+
```
235
+
out <= pos1 + L2
236
+
```
237
+
which means you will not overwrite data beyond pos1, so the data haven't read is safe, and we don't care the data already read.
We just need to focus on the reading and writing performed on array1. In `union_vector16`, both vectorized and scalar code still obey the basic rule: read from two inputs, do the union, and then write the output.
339
+
340
+
Let's say the length(cardinality) of input2 is L2:
341
+
```
342
+
|<- L2 ->|
343
+
array1: [output--- |input 1---|---]
344
+
array2: [input 2---]
345
+
```
346
+
Let's define 3 __m128i pointers, `pos1` starts from `input1`, `pos2` starts from `input2`, these 2 point at the next byte to read, `out` starts from `output`, pointing at the next byte to overwrite.
347
+
```
348
+
array1: [output--- |input 1---|---]
349
+
^ ^
350
+
out pos1
351
+
array2: [input 2---]
352
+
^
353
+
pos2
354
+
```
355
+
The union output always contains less or equal number of elements than all inputs added, so we have:
356
+
```
357
+
out <= pos1 + pos2
358
+
```
359
+
therefore:
360
+
```
361
+
out <= pos1 + L2
362
+
```
363
+
which means you will not overwrite data beyond pos1, so the data haven't read is safe, and we don't care the data already read.
0 commit comments