Skip to content

Unexpected number of results for large query #90

@jbusecke

Description

@jbusecke

I am exploring to use esgf-pyclient to get a list of all retracted CMIP6 datasets (for our automated maintenance of Pangeo CMIP6 cloud data.

I am trying the following:

from pyesgf.search import SearchConnection
conn = SearchConnection(
    'https://esgf-node.llnl.gov/esg-search',
    distrib=True,
)
ctx = conn.new_context(mip_era='CMIP6', retracted=True, replica=False, fields='id', facets=['doi'])
ctx.hit_count

And I get back a hit count of 691984

But when I try to extract a list of instance_ids

results = ctx.search(batch_size=10000)
retracted = [ds.dataset_id for ds in results]
len(retracted)

The list only has 240000 elements. That very even number makes me think that there is some internal limit I am hitting here?

Or did I miss something in the above code?

Any help on this would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions