Skip to content

ENH: Searching within a time interval #623

@aulemahal

Description

@aulemahal

(I first made this as a comment to PR #291, but I realized afterwards that the PR had been staled for three years, so I figured out that a new issue might be more appropriate.)

We had this issue and need when writing xscen. Our solution does not yet feel generic enough to be implemented here, but if you want to have a look at a first version is here : https://github.com/Ouranosinc/xscen/blob/b261a04ed73e398a60a0632bdb29be324dc3f5b6/xscen/catalog.py#L899-L998

The idea is similar to the staled PR. You have a "date_start" and a "date_end" column in the catalog. For a given "period", the code returns the row where the date_start - date_end interval overlaps with that period. Because of the limitations of pandas <2, we had to use pd.Period objects in our catalogs and the code suffers a bit from this workaround.

In addition to a simple "overlap", our function tries to guess the percentage of the period that is covered by the rows of the dataframe, so we only return the subset if a significant percentage is obtained. This has the restriction that it make sense if the rows of the dataset are not temporally overlapping, like for a single variable divided temporally in multiple files. A "full overlap" would often be too strict because of so many caveats (different calendars, imprecise date bounds).

I recently tried to use datetime64[ms] columns with pandas >= 2, which allows to simplify the function a bit and use more pd.Interval magic. It is here: https://github.com/Ouranosinc/xscen/blob/05054bfbf450c6b332e239e7866f766f51a47ed0/xscen/catalog.py#L892-L974.

There are still some caveats and questions to answer I think.

  • How do we tell intake-esm which columns are the time bounds ?
  • How do we solve the "coverage" issue neatly ?

With input from the intake-esm devs and users, we (Ouranos) could consider investing some time into adapting and upstreaming our solution as we would be more than happy to make xscen thinner.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions