No consistent result from different queries (CMIP6)

I'm trying to query all available CMIP6 projections for selected models, scenarios and variables but I'm getting different results depending on the additional parameters I use in the query. I write an example for clarity

I connect to the German data center

```
from pyesgf.logon import LogonManager
from pyesgf.search import SearchConnection

hostname = "esgf-data.dkrz.de"

lm = LogonManager()
lm.logon(
    hostname=hostname,
    bootstrap=True,
    username=username,
    password=password,
    interactive = False
)

url = "http://{}/esg-search".format(hostname)
conn = SearchConnection(url, distrib=True)
```

I query all available projections for the `CanESM5` model, scenario `ssp245`,  variable `zg500` and member_id `r1i1p1f1`

```
fields = {
    "project": "CMIP6",
    "frequency": "day",
    "variable": "zg500",
    "source_id": "CanESM5",
    "member_id": "r1i1p1f1",
    "experiment_id": "ssp245"
}

ctx = conn.new_context(**fields)
counts = ctx.hit_count
results = ctx.search()
print(f'Number of counts: {counts}')

print(f'\nFiles found:')
for r in results:
    print(r.dataset_id)
```

And get in theory 0 counts (as provided by `ctx.hit_count()`) but results show there are 2 instances for this specific query (one in the Canadian server and one in the American server)
```
Number of counts: 0

Files found:
CMIP6.ScenarioMIP.CCCma.CanESM5.ssp245.r1i1p1f1.AERday.zg500.gn.v20190429|crd-esgf-drc.ec.gc.ca
CMIP6.ScenarioMIP.CCCma.CanESM5.ssp245.r1i1p1f1.AERday.zg500.gn.v20190429|esgf-data1.llnl.gov
```

If I query without the restriction of the specific member_id (`r1i1p1f1`)

```
fields = {
    "project": "CMIP6",
    "frequency": "day",
    "variable": "zg500",
    "source_id": "CanESM5",
    #"member_id": "r1i1p1f1",
    "experiment_id": "ssp245",
}

ctx = conn.new_context(**fields)
counts = ctx.hit_count
results = ctx.search()
print(f'Number of counts: {counts}')

print(f'\nFiles found:')
for r in results:
    if "r1i1p1f1" in r.dataset_id:
        print(r.dataset_id)
```

I now get 4 counts (¿?), but the same instances than before. 

```
Number of counts: 4

Files found:
CMIP6.ScenarioMIP.CCCma.CanESM5.ssp245.r1i1p1f1.AERday.zg500.gn.v20190429|crd-esgf-drc.ec.gc.ca
CMIP6.ScenarioMIP.CCCma.CanESM5.ssp245.r1i1p1f1.AERday.zg500.gn.v20190429|esgf-data1.llnl.gov
```

However, if I extend the query to include `pr` in addition to `zg500`...

```
fields = {
    "project": "CMIP6",
    "frequency": "day",
    "variable": ["pr", "zg500"],
    "source_id": "CanESM5",
    #"member_id": "r1i1p1f1",
    "experiment_id": "ssp245",
}

ctx = conn.new_context(**fields)
counts = ctx.hit_count
results = ctx.search()
print(f'Number of counts: {counts}')

print(f'\nFiles found:')
for r in results:
    if "r1i1p1f1" in r.dataset_id:
        print(r.dataset_id)
```

 I get no available instances for zg500 anymore (I do get some for `pr` though)

```
Number of counts: 54

Files found:
CMIP6.ScenarioMIP.CCCma.CanESM5.ssp245.r1i1p1f1.day.pr.gn.v20190429|esgf3.dkrz.de
CMIP6.ScenarioMIP.CCCma.CanESM5.ssp245.r1i1p1f1.day.pr.gn.v20190306|esgf.ceda.ac.uk
CMIP6.ScenarioMIP.CCCma.CanESM5.ssp245.r1i1p1f1.day.pr.gn.v20190429|esgf.ceda.ac.uk
CMIP6.ScenarioMIP.CCCma.CanESM5.ssp245.r1i1p1f1.day.pr.gn.v20190429|esgf.nci.org.au
CMIP6.ScenarioMIP.CCCma.CanESM5.ssp245.r1i1p1f1.day.pr.gn.v20190306|esgf.nci.org.au
```

Or for example if I extend to different number of scenarios:
```
fields = {
    "project": "CMIP6",
    "frequency": "day",
    "variable": "zg500",
    "source_id": "CanESM5",
    #"member_id": "r1i1p1f1",
    "experiment_id": ["historical","ssp245","ssp585"],
}

ctx = conn.new_context(**fields)
counts = ctx.hit_count
results = ctx.search()
print(f'Number of counts: {counts}')

print(f'\nFiles found:')
for r in results:
    if "r1i1p1f1" in r.dataset_id:
        print(r.dataset_id)
```

I get no results for ssp's but only 1 for the historical.
```
Number of counts: 15

Files found:
CMIP6.CMIP.CCCma.CanESM5.historical.r1i1p1f1.AERday.zg500.gn.v20190429|crd-esgf-drc.ec.gc.ca
```

Is this behaviour expected? It is required for my analysis to be able to do cross-parameters searches so I can identify which simulations are available across a certain list of variables and scenarios.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

No consistent result from different queries (CMIP6) #102

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

No consistent result from different queries (CMIP6) #102

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions