Skip to content

NCI local filesystem indexed thumbnails are inaccessible from web #520

@jmettes

Description

@jmettes

image

https://explorer.dea.ga.gov.au/products/ga_ls9c_ard_3/datasets/abe45d2b-a54f-468d-98ea-2ffb30656260

Perhaps a wrapper can be written for NCI to handle these /g/data files and translate them to the corresponding THREDDS location? Not sure if the paths are always consistently 1:1 from /g/data and THREDDS, though. 🤷

NCI location file:///g/data/xu18/ga/ga_ls9c_ard_3/105/068/2022/10/06/ga_ls9c_nbart_3-2-1_105068_2022-10-06_final_thumbnail.jpg
THREDDS location https://dapds00.nci.org.au/thredds/fileServer/xu18/ga_ls9c_ard_3/105/068/2022/10/06/ga_ls9c_nbart_3-2-1_105068_2022-10-06_final_thumbnail.jpg
Catalog https://dapds00.nci.org.au/thredds/catalog/xu18/ga_ls9c_ard_3/105/068/2022/10/06/catalog.html

This seems to be at least possible in principle, because that seems to be done for AWS - mapping from S3://dea-public-data/... -> https://dea-public-data.s3.ap-southeast-2.amazonaws.com/...
image

Perhaps, while we're at it we could replace the location link with the THREDDS ones too:
image

Looks like it's handled here in the code:

def as_external_url(
url: str, s3_region: str = None, is_base: bool = False
) -> Optional[str]:
"""
Convert a URL to an externally-visible one.
>>> import pytest; pytest.skip() # doctests aren't working outside flask context :(
>>> # Converts s3 to http
>>> as_external_url('s3://some-data/L2/S2A_OPER_MSI_ARD__A030100_T56LNQ_N02.09/ARD-METADATA.yaml', "ap-southeast-2")
'https://some-data.s3.ap-southeast-2.amazonaws.com/L2/S2A_OPER_MSI_ARD__A030100_T56LNQ_N02.09/ARD-METADATA.yaml'
>>> # Other URLs are left as-is
>>> unconvertable_url = 'file:///g/data/xu18/ga_ls8c_ard_3-1-0_095073_2019-03-22_final.odc-metadata.yaml'
>>> unconvertable_url == as_external_url(unconvertable_url)
True
>>> as_external_url('some/relative/path.txt')
'some/relative/path.txt'
>>> # if base uri was none, we may want to return the s3 location instead of the metadata yaml
"""
parsed = urlparse(url)
if s3_region and parsed.scheme == "s3":
# get buckets for which link should be to data location instead of s3 link
data_location = flask.current_app.config.get("SHOW_DATA_LOCATION", {})
if parsed.netloc in data_location:
# remove the first '/'
path = parsed.path[1:]
if is_base:
# if it's the folder url, get the directory path
path = path[: path.rindex("/") + 1]
path = f"?prefix={path}"
return f"https://{data_location.get(parsed.netloc)}/{path}"
return f"https://{parsed.netloc}.s3.{s3_region}.amazonaws.com{parsed.path}"
return url

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions