Skip to content

implement lazy rasterstack #62

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 35 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
ad7aaf1
WIP
emmanuelmathot Feb 20, 2025
2a19e59
implement lazy raster cube with futures wrapper
emmanuelmathot Mar 3, 2025
22a4492
Add reduce_dimension process and enhance processes for RasterStack su…
emmanuelmathot Mar 13, 2025
892df6b
Update resample_spatial schema subtype to 'datacube' and enhance impo…
emmanuelmathot Mar 13, 2025
0735008
Refactor imports and improve code formatting in data model and implem…
emmanuelmathot Mar 13, 2025
afb40d2
Add LoadStac process and enhance SaveResultData for GeoTIFF support
emmanuelmathot Mar 16, 2025
890d639
Refactor import statements and improve code formatting in main, facto…
emmanuelmathot Mar 16, 2025
b6876c0
Add aggregate_spatial process and enhance reduce_dimension for Raster…
emmanuelmathot Mar 16, 2025
ac40f99
Enhance SaveResultData and aggregate_spatial to support RasterStack a…
emmanuelmathot Mar 18, 2025
6f399b3
Enhance array_element and save_result functions to support ImageData …
emmanuelmathot Mar 19, 2025
3f16f5e
Add CSV support for FeatureCollection data and implement retry logic …
emmanuelmathot Mar 19, 2025
3ca97e7
Refactor code structure and improve documentation for better maintain…
emmanuelmathot Mar 19, 2025
120b78d
better doc for notebooks
emmanuelmathot Mar 25, 2025
4a92d37
improve nb
emmanuelmathot Mar 25, 2025
363f126
Refactor item asset handling
emmanuelmathot Mar 26, 2025
d1c52bf
Refactor data model to enforce RasterStack usage and improve processi…
emmanuelmathot Mar 26, 2025
91bdc6d
Refactor imports and improve code consistency across implementations
emmanuelmathot Mar 26, 2025
b59dad2
Enhance documentation and data model with RasterStack implementation …
emmanuelmathot Apr 7, 2025
6b3f471
Merge branch 'main' into futures
emmanuelmathot Apr 7, 2025
e27177b
Update titiler/openeo/main.py
emmanuelmathot Apr 8, 2025
984e3f4
Merge branch 'main' into futures
emmanuelmathot Apr 14, 2025
39ed977
Implement code changes to enhance functionality and improve performance
emmanuelmathot Apr 14, 2025
9514e09
Merge branch 'main' into futures
emmanuelmathot Apr 14, 2025
8132974
Enhance Copernicus service and add new mathematical processes
emmanuelmathot Apr 15, 2025
f2f7a7f
Refactor reduction process tests and enhance output validation; impro…
emmanuelmathot Apr 15, 2025
3410a7a
Refactor code for improved readability and maintainability; clean up …
emmanuelmathot Apr 15, 2025
a839ead
Enhance array_element function with index and label validation; impro…
emmanuelmathot Apr 15, 2025
d397449
Refactor LoadCollection class to simplify property references; remove…
emmanuelmathot Apr 17, 2025
f6504c2
Refactor LoadCollection class to streamline property references; remo…
emmanuelmathot Apr 17, 2025
6437616
Update titiler/openeo/factory.py
emmanuelmathot Apr 17, 2025
24d8a06
Refactor array_element function parameter types to use Optional for i…
emmanuelmathot Apr 17, 2025
66e69f7
Merge branch 'futures' of https://github.com/sentinel-hub/titiler-ope…
emmanuelmathot Apr 17, 2025
bbee0ca
adjust types and remove pyproj
vincentsarago Apr 18, 2025
b176873
fix type
vincentsarago Apr 18, 2025
c64d259
Merge pull request #66 from sentinel-hub/review/edits--pr-62
vincentsarago Apr 18, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,21 @@
# Release Notes

## [Unreleased] (TBD)

### Added
- LazyRasterStack implementation for improved performance with on-demand data loading
- Consistent RasterStack usage across all spatial and reduction processes

### Changed
- Refactored data model to enforce RasterStack usage for more consistent processing
- Improved handling of STAC Items and Assets
- Refactored imports and improved code consistency across implementations
- Enhanced notebook examples with better documentation

## Unreleased

- Added support for converting OpenEO process graphs to CQL2-JSON format for STAC API filtering, improving interoperability with OpenEO filters.

## [0.1.0] (2025-04-07)

Initial release

[0.1.0]: <https://github.com/sentinel-hub/titiler-openeo/releases/tag/0.1.0>
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ The application provides with a minimal [openEO API (L1A and L1C)](https://opene
- Dynamic tiling services
- FastAPI-based application
- Middleware for CORS, compression, and caching
- Optimized RasterStack data model for consistent processing
- LazyRasterStack implementation for improved performance

## Roadmap

Expand Down
5 changes: 4 additions & 1 deletion docs/mkdocs.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
site_name: TiTiler-OpenEO
site_name: openEO by TiTiler
site_description: TiTiler backend for openEO

docs_dir: 'src'
Expand Down Expand Up @@ -32,6 +32,9 @@ extra:

nav:
- TiTiler-OpenEO: "index.md"
- Concepts:
- Overview: "concepts.md"
- RasterStack Data Model: "raster-stack.md"
- Development - Contributing: "contributing.md"
- Release Notes: "release-notes.md"

Expand Down
15 changes: 11 additions & 4 deletions docs/src/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,18 +45,25 @@ In openEO, a datacube is a fundamental concept and a key component of the platfo
Datacubes are powerful but can also be heavy to manipulate and often requires asynchronous processing to properly process and serve the data.
Unlike most of the existing openEO implementation, `titiler-openeo` project simplifies this concept by focusing on image raster data that can be processed on-the-fly and served as tiles or as light dynamic raw data.

### Raster with ImageData
### Raster Data Model

In order to make the processing as light and fast as possible, the backend must manipulate the data in a way that is easy to process and serve.
That is why most of the processes use [`ImageData`](https://github.com/sentinel-hub/titiler-openeo/blob/43702f98cbe2b418c4399dbdefd8623af446b237/titiler/openeo/processes/data/load_collection_and_reduce.json#L225) object type for passing data between the nodes of a process graph.
[`ImageData`](https://cogeotiff.github.io/rio-tiler/models/#imagedata) is provided by [rio-tiler](https://cogeotiff.github.io/rio-tiler/) that was initially designed to create slippy map tiles from large raster data sources and render these tiles dynamically on a web map.
There are two primary data structures used in the backend:

1. **ImageData**: Most processes use [`ImageData`](https://cogeotiff.github.io/rio-tiler/models/#imagedata) objects provided by [rio-tiler](https://cogeotiff.github.io/rio-tiler/) for individual raster operations. This object was initially designed to create slippy map tiles from large raster data sources and render these tiles dynamically on a web map.

![alt text](img/raster.png)

2. **RasterStack**: A dictionary mapping names/dates to ImageData objects, allowing for consistent handling of multiple raster layers.

3. **LazyRasterStack**: An optimized version of RasterStack that lazily loads data when accessed. This improves performance by only executing processing tasks when the data is actually needed.

### Reducing the data

The ImageData object is obtained by reducing as early as possible the data from the collections.
While the traditional [`load_collections` process](https://github.com/sentinel-hub/titiler-openeo/blob/43702f98cbe2b418c4399dbdefd8623af446b237/titiler/openeo/processes/data/load_collection.json#L2) is implemented and can be used, it is recommended to use the `load_collection_and_reduce` process to have immediately an `imagedata` object to manipulate. The `load_collection_and_reduce` process actually apply the [`apply_pixel_selection`](https://github.com/sentinel-hub/titiler-openeo/blob/main/titiler/openeo/processes/data/apply_pixel_selection.json) process on a stack of raster data that are loaded from the collections.
While the traditional [`load_collections` process](https://github.com/sentinel-hub/titiler-openeo/blob/43702f98cbe2b418c4399dbdefd8623af446b237/titiler/openeo/processes/data/load_collection.json#L2) is implemented and can be used, it is recommended to use the `load_collection_and_reduce` process to have immediately an `imagedata` object to manipulate. The `load_collection_and_reduce` process applies the [`apply_pixel_selection`](https://github.com/sentinel-hub/titiler-openeo/blob/main/titiler/openeo/processes/data/apply_pixel_selection.json) process on a stack of raster data that are loaded from the collections.

All spatial and reduction processes have been refactored to consistently use RasterStack as input and output, providing a more predictable and efficient processing pipeline.

![alt text](img/rasterstack.png)

Expand Down
128 changes: 128 additions & 0 deletions docs/src/raster-stack.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# RasterStack Data Model

In titiler-openeo, the RasterStack data model is central to how raster data is represented and processed throughout the system. This document explains the RasterStack concept, its implementation, and the performance benefits it provides.

## Overview

The RasterStack is a dictionary-like structure that maps names or dates to `ImageData` objects, allowing for consistent handling of multiple raster layers. This approach simplifies the processing of Earth Observation data by providing a unified interface for operations on raster data.

```python
# Example of RasterStack structure
RasterStack = {
"2023-01-01": ImageData(...), # First date
"2023-01-15": ImageData(...), # Second date
"2023-02-01": ImageData(...), # Third date
}
```

## ImageData vs RasterStack

- **ImageData**: Single raster layer, with dimensions, bounds, CRS, and metadata
- **RasterStack**: Collection of named ImageData objects, typically representing different dates or bands

## LazyRasterStack

The LazyRasterStack extends the basic RasterStack concept by implementing lazy loading of data:

```python
# LazyRasterStack only loads data when accessed
raster_stack = LazyRasterStack(tasks, date_name_fn)

# Data is only loaded when accessed
image_data = raster_stack["2023-01-01"] # This triggers loading
```

### Key Features of LazyRasterStack

1. **On-demand Loading**: Data is loaded only when actually accessed, reducing memory usage for large collections
2. **Task-based Execution**: Uses rio-tiler's task system to efficiently process data
3. **Exception Handling**: Gracefully handles common exceptions like TileOutsideBounds
4. **Dictionary Interface**: Maintains the familiar dictionary interface for easy integration

## Advantages of the RasterStack Model

- **Consistency**: All processes now use a consistent data structure
- **Performance**: LazyRasterStack reduces memory footprint and improves performance
- **Predictability**: Standardized input/output for all operations
- **Flexibility**: Works well with time series and multi-band data

## How Processing Works with RasterStack

### Load Phase
Data is loaded from collections into a LazyRasterStack structure:

```python
# Process graph example
{
"process_id": "load_collection",
"arguments": {
"id": "sentinel-2-l2a",
"spatial_extent": {...},
"temporal_extent": ["2023-01-01", "2023-03-01"],
"bands": ["B04", "B08"]
}
}
```

### Process Phase
Operations are applied uniformly to items in the RasterStack:

```python
# Process graph example to calculate NDVI
{
"process_id": "normalized_difference",
"arguments": {
"x": {"from_node": "load_collection", "band": "B08"},
"y": {"from_node": "load_collection", "band": "B04"}
}
}
```

### Output Phase
Results are rendered as a single image or maintained as a RasterStack:

```python
# Process graph example to save result
{
"process_id": "save_result",
"arguments": {
"data": {"from_node": "normalized_difference"},
"format": "png"
}
}
```

## Code Examples

Handling a RasterStack with basic operations:

```python
# Convert a single ImageData to a RasterStack
from titiler.openeo.processes.implementations.data_model import to_raster_stack

img_data = ImageData(...)
raster_stack = to_raster_stack(img_data) # {"data": img_data}

# Process each image in a RasterStack consistently
def apply_to_raster_stack(raster_stack, func):
"""Apply a function to each ImageData in a RasterStack"""
return {k: func(v) for k, v in raster_stack.items()}
```

## Performance Benefits

The LazyRasterStack implementation provides several performance benefits:

1. **Memory Efficiency**: Only loads data that is actually used
2. **Computation Efficiency**: Defers expensive computations until needed
3. **Error Resilience**: Handles exceptions during computation without failing the entire process
4. **Scalability**: Better handles large datasets with many dates/bands

## Best Practices

When working with the RasterStack data model:

1. Use `to_raster_stack()` to ensure consistent handling of both single images and collections
2. Prefer using LazyRasterStack for large collections
3. Design processes to operate on RasterStack inputs and produce RasterStack outputs
4. Use the dictionary interface (keys, values, items) for flexible processing
61 changes: 61 additions & 0 deletions notebooks/load_raster_stack.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# STAC Collections and Items\n",
"\n",
"With the `load_stac` process it's possible to load and use data provided by remote or local STAC Collections or Items. The following code snippet loads Sentinel-2 L2A data from a public STAC Catalog, using specific spatial and temporal extent, band name and also properties for cloud coverage."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import openeo\n",
"\n",
"# Connect to the back-end\n",
"connection = openeo.connect(\"http://127.0.0.1:8081/\")\n",
"# ToDo: Here you need to authenticate with authenticate_basic() or authenticate_oidc()\n",
"connection.authenticate_oidc()\n",
"\n",
"url = \"https://stac.dataspace.copernicus.eu/v1/collections/sentinel-2-l2a\"\n",
"spatial_extent = {\"west\": 11, \"east\": 12, \"south\": 46, \"north\": 47}\n",
"temporal_extent = [\"2019-01-01\", \"2019-01-15\"]\n",
"bands = [\"B04_60m\"]\n",
"properties = {\"eo:cloud_cover\": {\"lt\": 50}}\n",
"s2_cube = connection.load_stac(url=url,\n",
" spatial_extent=spatial_extent,\n",
" temporal_extent=temporal_extent,\n",
" bands=bands,\n",
" # properties=properties,\n",
")\n",
"s2_cube.execute()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
186 changes: 160 additions & 26 deletions notebooks/manhattan.ipynb

Large diffs are not rendered by default.

Loading
Loading