I have recently learned about [NVIDIA DALI (Data Loading Library)](https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html) that can be used for streaming data directly. This can be potentially used for data streaming of our files directly to GPUs to avoid any CPU-GPU data transfer latency. This discussion in Xarray Repository is relevant: https://github.com/pydata/xarray/discussions/9249 @weiji14 suggested that this can be integrated as a dedicated BackendEntrypoint as an alternative to the kvikio engine and I like this idea.