Skip to content

Commit e930f72

Browse files
authored
UDF docs: add recommendations about dimension handling (#564)
1 parent b6ab98a commit e930f72

File tree

1 file changed

+23
-0
lines changed

1 file changed

+23
-0
lines changed

docs/udf.rst

+23
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,18 @@ in smaller chunks and process these chunks separately (e.g. on isolated workers)
7878
Consequently, it's important that your **UDF algorithm operates correctly
7979
in such a chunked processing context**.
8080

81+
A very common mistake is to use index-based array indexing, rather than name based. The index based approach
82+
assumes that datacube dimension order is fixed, which is not guaranteed. Next to that, it also reduces the readability
83+
of your code. Label based indexing is a great feature of xarray, and should be used whenever possible.
84+
85+
As a rule of thumb, the UDF should preserve the dimensions and shape of the input
86+
data cube. The datacube chunk that is passed on by the backend does not have a fixed
87+
specification, so the UDF needs to be able to accomodate different shapes and sizes of the data.
88+
89+
There's important exceptions to this rule, that depend on the context in which the UDF is used.
90+
For instance, a UDF used as a reducer should effectively remove the reduced dimension from the
91+
output chunk. These details are documented in the next sections.
92+
8193
UDFs as apply/reduce "callbacks"
8294
---------------------------------
8395

@@ -347,6 +359,17 @@ the datacube.
347359
{'dimension': 'y', 'value': 8, 'unit': 'px'}
348360
])
349361
362+
363+
364+
.. warning::
365+
366+
The ``apply_neighborhood`` is the most versatile, but also most complex process. Make sure to keep an eye on the dimensions
367+
and the shape of the DataArray returned by your UDF. For instance, a very common error is to somehow 'flip' the spatial dimensions.
368+
Debugging the UDF locally can help, but then you will want to try and reproduce the input that you get also on the backend.
369+
This can typically be achieved by using logging to inspect the DataArrays passed into your UDF backend side.
370+
371+
372+
350373
Example: Smoothing timeseries with a user defined function (UDF)
351374
==================================================================
352375

0 commit comments

Comments
 (0)