Skip to content

robust/ranged download support #747

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
soxofaan opened this issue Mar 10, 2025 · 4 comments · May be fixed by #759
Open

robust/ranged download support #747

soxofaan opened this issue Mar 10, 2025 · 4 comments · May be fixed by #759

Comments

@soxofaan
Copy link
Member

EO data downloads can be pretty big, and big transfers can be brittle in some situations.
If the server supports ranged downloads, it's possible to do this more robustly.

@soxofaan
Copy link
Member Author

@pvbouwel
Copy link

Some notes on getting more production-like code:

  • Make sure to limit to GET operations (see rev2 with 0-0 byte range) as that would also work for things like pre-signed S3 URLs
  • Make sure retries are done for certain HTTP status codes
  • Rather than writing immediately to the target file keep chunks separate until all chunks are download and then reconstruct the file. (to reap more benefits of retries)

@mbuchhorn
Copy link

we solved that in the WEED project for cases were we have to download the data locally in this way:
openEO pipeline always dump the files into a S3 bucket and then in the jobmanager we have a threaded download to get the files from S3 and after successful download (etag check) to delete the file on S3. This is even way faster then directly downloading from openEO.
example: 4GB produced file with direct download needs roughly 20 - 25 min for the download to local disk after the file is successful processed. With our setup the download is roughly 3-4 minutes.
https://github.com/ESA-WEED-project/eo_processing/blob/main/src/eo_processing/utils/jobmanager.py#L260

if you have questions to the storage object we are using then just ask :)

@jdries
Copy link
Collaborator

jdries commented Apr 2, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants