You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A **block** is a basic unit of computation in TensorIR. Notably, the block contains a few additional information than the plain NumPy code. A block contains a set of block axes (`vi, vj, vk`) and computations defined around them.
189
+
A **block** is a basic unit of computation in TensorIR. Notably, the block contains a few additional pieces of information compared to the plain NumPy code. A block contains a set of block axes (`vi, vj, vk`) and computations defined around them.
190
190
191
191
```python
192
192
vi = T.axis.spatial(128, i)
@@ -214,15 +214,15 @@ The figure below summarizes the block (iteration) axes and the read write relati
214
214
215
215

216
216
217
-
In our example, block Y computes the result `Y[vi, vj]` by reading values from `A[vi, vk]` and `B[vk, vj]` and perform sum over all possible `vk`. In this particular example, if we fix `vi`, `vj` to be `(0, 1)`, and run the block for `vk in range(0, 128)`, we can effectively compute `C[0, 1]`independent from other possible locations (that have different values of vi, vj).
217
+
In our example, block Y computes the result `Y[vi, vj]` by reading values from `A[vi, vk]` and `B[vk, vj]` and perform sum over all possible `vk`. In this particular example, if we fix `vi`, `vj` to be `(0, 1)`, and run the block for `vk in range(0, 128)`, we can effectively compute `C[0, 1]`independently from other possible locations (that have different values of vi, vj).
218
218
219
-
Notably, for a fixed value of vi and vj, the computation block produces a point value at a spatial location of Y (`Y[vi, vj]`) that is independent from other locations in `Y` (with a different `vi, vj` values). we can call `vi`, `vj`**spatial axes** as they directly corresponds to the beginning of a spatial region of buffers that the block writes to. The axes that involves in reduction (`vk`) are named as**reduce axes**.
219
+
Notably, for a fixed value of vi and vj, the computation block produces a point value at a spatial location of Y (`Y[vi, vj]`) that is independent from other locations in `Y` (with a different `vi, vj` values). We can call `vi`, `vj`**spatial axes** as they directly correspond to the beginning of a spatial region of buffers that the block writes to. The axes involved in reduction (`vk`) are named **reduce axes**.
220
220
221
221
#### Why Extra Information in Block
222
222
223
-
One crucial observation is that the additional information (block axis range and their properties) makes the block to be **self-contained** when it comes to the iterations that it is supposed to carry out independent from the external loop-nest `i`, `j`, `k`.
223
+
One crucial observation is that the additional information (block axis range and their properties) makes the block **self-contained** when it comes to the iterations that it is supposed to carry out independently from the external loop-nest `i`, `j`, `k`.
224
224
225
-
The block axis information also provides additional properties that help us to validate the correctness of the external loops that are used to carry out the computation. For example, the code block below will result in an error because the loop expects an iterator of size `128`, but we only bound it to a for loop of size `127`.
225
+
The block axis information also provides additional properties that help us validate the correctness of the external loops used to carry out the computation. For example, the code block below will result in an error because the loop expects an iterator of size `128`, but we only bound it to a for loop of size `127`.
226
226
227
227
228
228
```python
@@ -237,7 +237,7 @@ for i in range(127):
237
237
238
238
This additional information also helps us in following machine learning compilation analysis. For example, while we can always parallelize over spatial axes, parallelizing over reduce axes will require specific strategies.
239
239
240
-
#### Sugars for Block Axes Binding
240
+
#### Sugar for Block Axes Binding
241
241
242
242
In situations where each of the block axes is directly mapped to an outer loop iterator, we can use `T.axis.remap` to declare the block axis in a single line.
243
243
@@ -285,11 +285,11 @@ The function attribute information contains extra information about the function
Here `global_symbol` corresponds to the name of the function, and `tir.noalias` is an attribute indicating that all the buffer memories do not overlap. You also feel free safely skip these attributes for now as they won't affect the overall understanding of the high-level concepts.
288
+
Here `global_symbol` corresponds to the name of the function, and `tir.noalias` is an attribute indicating that all the buffer memory areas do not overlap. You also feel free safely skip these attributes for now as they won't affect the overall understanding of the high-level concepts.
289
289
290
290
The two decorators, `@tvm.script.ir_module` and `@T.prim_func` are used to indicate the type of the corresponding part.
291
291
292
-
`@tvm.script.ir_module`indicate that MyModule is an `IRModule`. IRModule is the container object to hold a collection of tensor functions in machine learning compilation.
292
+
`@tvm.script.ir_module`indicates that MyModule is an `IRModule`. IRModule is the container object to hold a collection of tensor functions in machine learning compilation.
293
293
294
294
295
295
```{.python .input n=6}
@@ -333,7 +333,7 @@ So far, we have gone through one example instance of TensorIR program and covere
333
333
334
334
- Buffer declarations in parameters and intermediate temporary memory.
335
335
- For loop iterations.
336
-
-**Block** and block axes properties.
336
+
-**Blocks** and block axes properties.
337
337
338
338
In this section, we have gone through one example instance of TensorIR that covers the most common elements in MLC.
339
339
@@ -345,7 +345,7 @@ In the last section, we learned about TensorIR and its key elements. Now, let us
345
345
346
346
In the last section, we have given an example of how to write `mm_relu` using low-level numpy. In practice, there can be multiple ways to implement the same functionality, and each implementation can result in different performance.
347
347
348
-
We will discuss the reason behind the performance and how to leverage those variants in future lectures. In this lecture, let us focus on the ability to get different implementation variants using transformations.
348
+
We will discuss the reason behind the performance difference and how to leverage those variants in future lectures. In this lecture, let us focus on the ability to get different implementation variants using transformations.
@@ -372,7 +372,7 @@ The above code block shows a slightly different variation of `mm_relu`. To see t
372
372
- We replace the `j` loop with two loops, `j0` and `j1`.
373
373
- The order of iterations changes slightly
374
374
375
-
In order to get `lnumpy_mm_relu_v2`, we have to rewrite a new function (or manual copy-pasting and editing). TensorIR introduces a utility called Schedule that allows us to do that pragmatically.
375
+
In order to get `lnumpy_mm_relu_v2`, we have to rewrite it into a new function (or manually copy-paste and edit). TensorIR introduces a utility called Schedule that allows us to do that pragmatically.
376
376
377
377
To remind ourselves, let us look again at the current MyModule content.
378
378
@@ -576,9 +576,9 @@ TVMScript is also a useful way to inspect the tensor functions in the middle of
576
576
577
577
#### Generate TensorIR code using Tensor Expression
578
578
579
-
In many cases, our development forms are higher-level abstractions that are not at the loop level. So another common way to obtain TensorIR is pragmatically generating relevant code.
579
+
In many cases, our development forms are higher-level abstractions that are not at the loop level. So another common way to obtain TensorIR is programmatically generating relevant code.
580
580
581
-
Tensor expression (te) is a domain-specific language that describes a sequence of computations via an expressionlike API.
581
+
Tensor expression (te) is a domain-specific language that describes a sequence of computations via an expression-like API.
The tensor expression API provides a helpful tool to generate TensorIR functions for a given higher-level input.
608
608
609
-
### TensorIR Functions as Result of Transformations
609
+
### TensorIR Functions as Results of Transformations
610
610
611
-
In practice, we also get TensorIR functions as results of transformations. This happens when we start with two primitive tensor functions (mm and relu), then apply a pragmatic transformation to "fuse" them into a single primitive tensor function, ` mm_relu`. We will cover the details in future chapters.
611
+
In practice, we also get TensorIR functions as results of transformations. This happens when we start with two primitive tensor functions (mm and relu), then apply a programmatic transformation to "fuse" them into a single primitive tensor function, ` mm_relu`. We will cover the details in future chapters.
Copy file name to clipboardExpand all lines: chapter_tensor_program/tensorir_exercises.md
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -30,7 +30,7 @@ c_np
30
30
31
31
Before we directly write TensorIR, we should first translate high-level computation abstraction (e.g., `ndarray + ndarray`) to low-level python implementation (standard for loops with element access and operation)
32
32
33
-
Notably, the initial value of the o utput array (or buffer) is not always `0`. We need to write or initialize it in our implementation, which is important for reduction operator (e.g. matmul and conv)
33
+
Notably, the initial value of the output array (or buffer) is not always `0`. We need to write or initialize it in our implementation, which is important for reduction operator (e.g. matmul and conv)
In the lecture, we learned that TensorIR is not only a programming language but also an abstraction for program transformation. In this section, let's try to transform the program. We take `bmm_relu` (`batched_matmul_relu`) in our studies, which is a variant of operations that common appear in models such as transformers.
159
159
160
160
#### Parallel, Vectorize and Unroll
161
-
First, we introduce some new primitives, `parallel`, `vectorize` and `unroll`. These three primitives operates on loops to indicate how this loop execute. Here is the example:
161
+
First, we introduce some new primitives, `parallel`, `vectorize` and `unroll`. These three primitives operate on loops to indicate how this loop executes. Here is the example:
0 commit comments