-
Notifications
You must be signed in to change notification settings - Fork 20
Description
I'm trying to query all available CMIP6 projections for selected models, scenarios and variables but I'm getting different results depending on the additional parameters I use in the query. I write an example for clarity
I connect to the German data center
from pyesgf.logon import LogonManager
from pyesgf.search import SearchConnection
hostname = "esgf-data.dkrz.de"
lm = LogonManager()
lm.logon(
hostname=hostname,
bootstrap=True,
username=username,
password=password,
interactive = False
)
url = "http://{}/esg-search".format(hostname)
conn = SearchConnection(url, distrib=True)
I query all available projections for the CanESM5
model, scenario ssp245
, variable zg500
and member_id r1i1p1f1
fields = {
"project": "CMIP6",
"frequency": "day",
"variable": "zg500",
"source_id": "CanESM5",
"member_id": "r1i1p1f1",
"experiment_id": "ssp245"
}
ctx = conn.new_context(**fields)
counts = ctx.hit_count
results = ctx.search()
print(f'Number of counts: {counts}')
print(f'\nFiles found:')
for r in results:
print(r.dataset_id)
And get in theory 0 counts (as provided by ctx.hit_count()
) but results show there are 2 instances for this specific query (one in the Canadian server and one in the American server)
Number of counts: 0
Files found:
CMIP6.ScenarioMIP.CCCma.CanESM5.ssp245.r1i1p1f1.AERday.zg500.gn.v20190429|crd-esgf-drc.ec.gc.ca
CMIP6.ScenarioMIP.CCCma.CanESM5.ssp245.r1i1p1f1.AERday.zg500.gn.v20190429|esgf-data1.llnl.gov
If I query without the restriction of the specific member_id (r1i1p1f1
)
fields = {
"project": "CMIP6",
"frequency": "day",
"variable": "zg500",
"source_id": "CanESM5",
#"member_id": "r1i1p1f1",
"experiment_id": "ssp245",
}
ctx = conn.new_context(**fields)
counts = ctx.hit_count
results = ctx.search()
print(f'Number of counts: {counts}')
print(f'\nFiles found:')
for r in results:
if "r1i1p1f1" in r.dataset_id:
print(r.dataset_id)
I now get 4 counts (¿?), but the same instances than before.
Number of counts: 4
Files found:
CMIP6.ScenarioMIP.CCCma.CanESM5.ssp245.r1i1p1f1.AERday.zg500.gn.v20190429|crd-esgf-drc.ec.gc.ca
CMIP6.ScenarioMIP.CCCma.CanESM5.ssp245.r1i1p1f1.AERday.zg500.gn.v20190429|esgf-data1.llnl.gov
However, if I extend the query to include pr
in addition to zg500
...
fields = {
"project": "CMIP6",
"frequency": "day",
"variable": ["pr", "zg500"],
"source_id": "CanESM5",
#"member_id": "r1i1p1f1",
"experiment_id": "ssp245",
}
ctx = conn.new_context(**fields)
counts = ctx.hit_count
results = ctx.search()
print(f'Number of counts: {counts}')
print(f'\nFiles found:')
for r in results:
if "r1i1p1f1" in r.dataset_id:
print(r.dataset_id)
I get no available instances for zg500 anymore (I do get some for pr
though)
Number of counts: 54
Files found:
CMIP6.ScenarioMIP.CCCma.CanESM5.ssp245.r1i1p1f1.day.pr.gn.v20190429|esgf3.dkrz.de
CMIP6.ScenarioMIP.CCCma.CanESM5.ssp245.r1i1p1f1.day.pr.gn.v20190306|esgf.ceda.ac.uk
CMIP6.ScenarioMIP.CCCma.CanESM5.ssp245.r1i1p1f1.day.pr.gn.v20190429|esgf.ceda.ac.uk
CMIP6.ScenarioMIP.CCCma.CanESM5.ssp245.r1i1p1f1.day.pr.gn.v20190429|esgf.nci.org.au
CMIP6.ScenarioMIP.CCCma.CanESM5.ssp245.r1i1p1f1.day.pr.gn.v20190306|esgf.nci.org.au
Or for example if I extend to different number of scenarios:
fields = {
"project": "CMIP6",
"frequency": "day",
"variable": "zg500",
"source_id": "CanESM5",
#"member_id": "r1i1p1f1",
"experiment_id": ["historical","ssp245","ssp585"],
}
ctx = conn.new_context(**fields)
counts = ctx.hit_count
results = ctx.search()
print(f'Number of counts: {counts}')
print(f'\nFiles found:')
for r in results:
if "r1i1p1f1" in r.dataset_id:
print(r.dataset_id)
I get no results for ssp's but only 1 for the historical.
Number of counts: 15
Files found:
CMIP6.CMIP.CCCma.CanESM5.historical.r1i1p1f1.AERday.zg500.gn.v20190429|crd-esgf-drc.ec.gc.ca
Is this behaviour expected? It is required for my analysis to be able to do cross-parameters searches so I can identify which simulations are available across a certain list of variables and scenarios.