Skip to content

Add preliminary support for ISO-8601 timestamps via date: archive match pattern (#8715) #8776

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

c-herz
Copy link

@c-herz c-herz commented Apr 19, 2025

This PR adds preliminary support for matching ISO 8601 timestamps with the date: archive filter, and intends to begin addressing the requirements of #8715.

Timestamps (except for Unix epoch forms) are currently assumed to be in the user's local timezone and converted to UTC internally. The following formats are currently supported:

  1. YYYY
  2. YYYY-MM
  3. YYYY-MM-DD
  4. YYYY-MM-DDTHH -> matches a 1-hour interval
  5. YYYY-MM-DDTHH:MM -> matches a 1-minute interval
  6. YYYY-MM-DDTHH:MM:SS -> matches a 1-second interval
  7. YYYY-MM-DDTHH:MM:SS.ffff -> matches an exact timestamp, including fractional seconds
  8. @123456789 -> Unix epoch (interpreted as UTC)

(Is the fractional-second exact match useful in practice? Feedback welcome on this.)

Still in progress:

  • Support for wildcard-style matching (date:2025-*-01)
  • User-specified timezones and offsets
  • Keywords like now, oldest, newest, etc
  • Explicit duration intervals
  • RFC 3339 / RFC 9557 format support
  • Test/documentation coverage

@c-herz c-herz changed the title Add preliminary support for ISO-8601 timestamps via date: archive match pattern Add preliminary support for ISO-8601 timestamps via date: archive match pattern (#8715) Apr 19, 2025
@PhrozenByte
Copy link
Contributor

PhrozenByte commented Apr 20, 2025

Thanks for picking this up! ❤️

  1. YYYY-MM-DDTHH:MM:SS.ffff -> matches an exact timestamp, including fractional seconds
  2. @123456789 -> Unix epoch (interpreted as UTC)

(Is the fractional-second exact match useful in practice? Feedback welcome on this.)

What's the precision of an archive's creation time? From the code I assume it's with fractional seconds, right? I absolutely agree with you then: There should be a variant that is guaranteed to match a single archive. I feel like that Unix timestamps should also optionally support fractional seconds then.

  1. YYYY-MM-DDTHH -> matches a 1-hour interval

From reading the code I derive that the 1-hour interval is matched exclusively, i.e. it's actually matching any archive within 00:59:59.9999… hours, correct? Perfect 👍

# Year/Year-month/Year-month-day
parts = expr.split("-")
try:
    if len(parts) == 1:                    # YYYY
        year = int(parts[0])

Even though I like the simplicity, I feel like that Borg should be pretty strict about the format, because being less strict easily leads to ambiguity. For example, is date:0 supposed to match any archive created in year 0? Probably, but it gets way less clear with (now deprecated) truncated ISO8601 dates: What does the pattern date:25-1 describe? January of the year 25, January 1925, or January 2025?

If there's no library that can be used, I always imagined that the code would basically revolve around a single, rather strict regex with bottom-up optional groups for year, month, day, hour, minutes, seconds, and fractal seconds, or * as wildcard, supplemented by another regex to match periods, and simple matchers for Unix timestamps and keywords. I'm not saying that this is the best approach, that's just what I imagined while writing #8715.

In general I like to encourage creating extensive unit tests as early as possible. It's elegant and simple code now (🚀 👍), but complexity will increase greatly when adding more and more features.

Note: I can read the code, but can't do an actual code review - for that I just don't known enough of Borg's code.

Copy link
Member

@ThomasWaldmann ThomasWaldmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Some minor stuff I found...

"""
Turn a date: expression into a predicate ts->bool.
Supports:
1) Full ISO‑8601 timestamps with minute (and optional seconds/fraction)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So HH:MM:SS would be an interval of 1s?

Comment on lines 217 to 219
2) Hour-only: YYYY‑MM‑DDTHH -> interval of 1 hour
3) Minute-only: YYYY‑MM‑DDTHH:MM -> interval of 1 minute
4) YYYY, YYYY‑MM, YYYY‑MM‑DD -> day/month/year intervals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to list all patterns, please keep the list sorted by length of interval and use 1 pattern/interval per line.

hour_re = re.compile(r"^\d{4}-\d{2}-\d{2}T\d{2}$")
if hour_re.match(expr):
start = parse_local_timestamp(expr + ":00:00", tzinfo=timezone.utc)
return interval_predicate(start, start + timedelta(hours=1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just use 1 regex with group names ((?P<name>...) that covers 1) .. 4) and also YYYY, YYYY-MM, YYYY-MM-DD cases from below.

After a single m = re.match(regex, expr), you can check m.groupdict() in the right order (fraction, S, M, H, d, m, y) to determine which case you have.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use re.VERBOSE so you can have a multi-line, commented regex for this.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Apr 21, 2025

BTW, if you install the pre-commit hook, you can have your commits automatically formatted.

https://borgbackup.readthedocs.io/en/stable/development.html#building-a-development-environment

Comment on lines 254 to 260
(?P<fraction>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+) # full timestamp with fraction
| (?P<second> \d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}) # no fraction
| (?P<minute> \d{4}-\d{2}-\d{2}T\d{2}:\d{2}) # minute precision
| (?P<hour> \d{4}-\d{2}-\d{2}T\d{2}) # hour precision
| (?P<day> \d{4}-\d{2}-\d{2}) # day precision
| (?P<month> \d{4}-\d{2}) # month precision
| (?P<year> \d{4}) # year precision
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting approach.

What I meant was rather something like (simplified to cover only YYYY and YYYY-MM here as an example):

(?P<year>\d{4})
(-
 (?P<month>\d{2})
)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes this would've been much simpler. Will try to work on refactoring it to this approach tomorrow.

(?:
:(?P<minute>\d{2}|\*) # minute (MM or *)
(?:
:(?P<second>\d{2}(?:\.\d+)?|\*) # second (SS or SS.fff or *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay, that is a nice regex now. :-)

You could also deal with fractional seconds as with all other components: it is a optional component, so you can also match it with a named group and later check the groupdict.

Comment on lines +306 to +312
r"(?:(?P<years>\d+)Y)?"
r"(?:(?P<months>\d+)M)?"
r"(?:(?P<weeks>\d+)W)?"
r"(?:(?P<days>\d+)D)?"
r"(?:(?P<hours>\d+)h)?"
r"(?:(?P<minutes>\d+)m)?"
r"(?:(?P<seconds>\d+)s)?"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't we use to have lowercase ymwd and uppercase HMS?

Copy link
Contributor

@PhrozenByte PhrozenByte Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@c-herz, is that a custom format, or did I miss an ISO 8601 or RFC update? I know ISO-8601's P3Y6M4DT12H30M5S (P designator, 3 years, 6 months, 4 days, T time separator, 12 hours, 30 minutes, 5 seconds, all being optional) format. I honestly don't like that "official" format very much (it's so cumbersome…), especially in regards to using M for both months and minutes (it's still unambiguous due to the T separator), but I feel like we should support it, because it's an official part of ISO 8601.

However, I kinda like the idea of additionally supporting our own format. Like, why not support 12:34:56 (or similar, just a quick idea) to specify a 12 hours, 34 minutes, 56 seconds duration? Why not also support (space) as alternative to T to separate times? The designators could ignore case (i.e. also supporting 7d for 7 days), and we could allow a space after each term. This might go too far though. In general, I'm not sure whether we should put things into the "official" P designator, or rather additionally support our own (like the suggested D, e.g. D 3y 6m 4d 12:30:05?). Is there maybe another common or even formalized/standardized (like another RFC; I did some research, unfortunately I didn't find anything official or quasi-official) format? WDYT?

# ISO week date: YYYY-Www or YYYY-Www-D
(?P<isoweek_year>\d{4})-W(?P<isoweek_week>\d{2})(?:-(?P<isoweek_day>\d))?
| # Ordinal date: YYYY-DDD
(?P<ordinal_year>\d{4})-(?P<ordinal_day>\d{3})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a bit silly, but it just got me thinking: Could we support ordinal days with wildcard years? 🤔 Same for ISO weeks, possibly even *-W*-5 (wow, that looks crazy 😆) to match all archives created on a Friday? Not sure whether users would use it, but if it's possible? WDYT?

| # Ordinal date: YYYY-DDD
(?P<ordinal_year>\d{4})-(?P<ordinal_day>\d{3})
| # Unix epoch
@(?P<epoch>\d+)
Copy link
Contributor

@PhrozenByte PhrozenByte Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also supporting fractal seconds here would be amazing! ❤️

Side note: I might read the regex wrong, but this also means @1745577106[Europe/Berlin] (or any other TZ format) is supported? AFAIK Unix timestamps are UTC per definition, right? Or is the TZ info used later for something else?

Also supports:
TIMESTAMP/TIMESTAMP
TIMESTAMP/DURATION
DURATION/TIMESTAMP.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! ❤️

Just DURATION (i.e. without a TIMESTAMP) isn't supported yet, right? Just specifying a duration is helpful to match the latest archives relative to "now" (which could be another useful keyword).

Copy link

codecov bot commented Apr 25, 2025

Codecov Report

Attention: Patch coverage is 90.71038% with 17 lines in your changes missing coverage. Please review.

Project coverage is 81.62%. Comparing base (d1899c1) to head (9cb5e5f).
Report is 29 commits behind head on master.

Files with missing lines Patch % Lines
src/borg/helpers/time.py 88.66% 11 Missing and 6 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8776      +/-   ##
==========================================
- Coverage   81.89%   81.62%   -0.28%     
==========================================
  Files          74       74              
  Lines       13324    13517     +193     
  Branches     1968     2008      +40     
==========================================
+ Hits        10912    11033     +121     
- Misses       1750     1802      +52     
- Partials      662      682      +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@c-herz c-herz marked this pull request as ready for review April 26, 2025 00:45
@ThomasWaldmann
Copy link
Member

before doing further changes, please rebase on current master branch to get the cython workaround. otherwise, builds will fail.

@ThomasWaldmann
Copy link
Member

Guess I'ld like to merge this for next beta, can we finish this until then?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants