-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Hi again! This is the continued discussion from opendatacube/odc-stac#177. As mentioned in the other issue, I'm using HLS data (both HLSL30.v2.0 and HLSS30.v2.0 collections) for a short time frame (5-16-2024 through 5-18-2024) over a small region of interest in Washington state. I am currently using odc-geo==0.4.6, odc-stac==0.3.9, and odc-algo=0.2.2. We're generating composites for the 15th, 50th, and 90th percentiles. I've attached a jupyter notebook with a reproducible example.
As mentioned in the odc-stac issue, using a slightly different method, we anticipate the percentile composites to look something like this (the 50th percentile is shown here).
However, in using xr_quantile, we're getting different outputs. This is what the RGB of cloud masked input data looks like, one time slice per day.
This is what the 15th, 50th, and 90th percentiles look like when generated with xr_quantile. Given how much coverage there is with the input imagery over the three days, we would expect each percentile image to have lots of coverage with at least one valid value per pixel (as opposed to the widespread NaN values, which are mostly present in the 90th percentile).
Based on these results, I have several questions about the algorithm. Starting with a simple example, if there are 3 scenes that overlap for a particular pixel, each from a different day, a pixel could have up to 3 values. Let's say those three values are 100, 500, and 1000. With the way we compute percentiles, the 15th, 50th, and 90th percentiles would be 220, 500, and 900, respectively.
-
How does xr_quantile handle NaN values in the pixel value arrays? For the three day example, (nan, 500, 1000), np.percentile() would return all NaN, but np.nanpercentile() would return 15th=575, 50th=750, 90th=950. How would xr_quantile handle the (nan, 500, 1000) array?
-
Related to the first question, is this value what the nodata argument in xr_quantile is designed as an input for? For example, if I use a nodata=0, the array (nan, 500, 1000) would become (0, 500, 1000), and the outputs would be 15th=150, 50th=500, 90th=900.
-
Do we have any idea what might be going on with the overlap at the different quantiles? As I mentioned, we would expect all three days worth of pixel values to be accounted for in each quantile calculation so that each image has coverage like quantile=0.15 does.
replication-code-odc-algo.zip
Question -- what's the best way to provide you with the sample dataset? It's too big to attach here.