-
-
Notifications
You must be signed in to change notification settings - Fork 766
Add preliminary support for ISO-8601 timestamps via date: archive match pattern (#8715) #8776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Thanks for picking this up! ❤️
What's the precision of an archive's creation time? From the code I assume it's with fractional seconds, right? I absolutely agree with you then: There should be a variant that is guaranteed to match a single archive. I feel like that Unix timestamps should also optionally support fractional seconds then.
From reading the code I derive that the 1-hour interval is matched exclusively, i.e. it's actually matching any archive within 00:59:59.9999… hours, correct? Perfect 👍
Even though I like the simplicity, I feel like that Borg should be pretty strict about the format, because being less strict easily leads to ambiguity. For example, is If there's no library that can be used, I always imagined that the code would basically revolve around a single, rather strict regex with bottom-up optional groups for year, month, day, hour, minutes, seconds, and fractal seconds, or In general I like to encourage creating extensive unit tests as early as possible. It's elegant and simple code now (🚀 👍), but complexity will increase greatly when adding more and more features. Note: I can read the code, but can't do an actual code review - for that I just don't known enough of Borg's code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
Some minor stuff I found...
src/borg/helpers/time.py
Outdated
""" | ||
Turn a date: expression into a predicate ts->bool. | ||
Supports: | ||
1) Full ISO‑8601 timestamps with minute (and optional seconds/fraction) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So HH:MM:SS would be an interval of 1s?
src/borg/helpers/time.py
Outdated
2) Hour-only: YYYY‑MM‑DDTHH -> interval of 1 hour | ||
3) Minute-only: YYYY‑MM‑DDTHH:MM -> interval of 1 minute | ||
4) YYYY, YYYY‑MM, YYYY‑MM‑DD -> day/month/year intervals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to list all patterns, please keep the list sorted by length of interval and use 1 pattern/interval per line.
src/borg/helpers/time.py
Outdated
hour_re = re.compile(r"^\d{4}-\d{2}-\d{2}T\d{2}$") | ||
if hour_re.match(expr): | ||
start = parse_local_timestamp(expr + ":00:00", tzinfo=timezone.utc) | ||
return interval_predicate(start, start + timedelta(hours=1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just use 1 regex with group names ((?P<name>...)
that covers 1) .. 4) and also YYYY, YYYY-MM, YYYY-MM-DD cases from below.
After a single m = re.match(regex, expr)
, you can check m.groupdict()
in the right order (fraction, S, M, H, d, m, y) to determine which case you have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use re.VERBOSE so you can have a multi-line, commented regex for this.
BTW, if you install the pre-commit hook, you can have your commits automatically formatted. https://borgbackup.readthedocs.io/en/stable/development.html#building-a-development-environment |
Refactored timestamp parsing to use `groupdict` and verbose regex, as recommended. # Conflicts: # src/borg/helpers/time.py
src/borg/helpers/time.py
Outdated
(?P<fraction>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+) # full timestamp with fraction | ||
| (?P<second> \d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}) # no fraction | ||
| (?P<minute> \d{4}-\d{2}-\d{2}T\d{2}:\d{2}) # minute precision | ||
| (?P<hour> \d{4}-\d{2}-\d{2}T\d{2}) # hour precision | ||
| (?P<day> \d{4}-\d{2}-\d{2}) # day precision | ||
| (?P<month> \d{4}-\d{2}) # month precision | ||
| (?P<year> \d{4}) # year precision |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting approach.
What I meant was rather something like (simplified to cover only YYYY and YYYY-MM here as an example):
(?P<year>\d{4})
(-
(?P<month>\d{2})
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes this would've been much simpler. Will try to work on refactoring it to this approach tomorrow.
…us testing later today
…mat to pass style checks
…d timezones; added mutual exclusion enforcement to wildcard+epoch
…maintainer recommendation (all tests passing)
…onstruction and interval parsing
… preserve time-of-day in offset_n_months
…g; include tests for oldest/newest scenarios
…ive patterns; include tests
(?: | ||
:(?P<minute>\d{2}|\*) # minute (MM or *) | ||
(?: | ||
:(?P<second>\d{2}(?:\.\d+)?|\*) # second (SS or SS.fff or *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay, that is a nice regex now. :-)
You could also deal with fractional seconds as with all other components: it is a optional component, so you can also match it with a named group and later check the groupdict.
r"(?:(?P<years>\d+)Y)?" | ||
r"(?:(?P<months>\d+)M)?" | ||
r"(?:(?P<weeks>\d+)W)?" | ||
r"(?:(?P<days>\d+)D)?" | ||
r"(?:(?P<hours>\d+)h)?" | ||
r"(?:(?P<minutes>\d+)m)?" | ||
r"(?:(?P<seconds>\d+)s)?" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't we use to have lowercase ymwd and uppercase HMS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@c-herz, is that a custom format, or did I miss an ISO 8601 or RFC update? I know ISO-8601's P3Y6M4DT12H30M5S
(P
designator, 3 years, 6 months, 4 days, T
time separator, 12 hours, 30 minutes, 5 seconds, all being optional) format. I honestly don't like that "official" format very much (it's so cumbersome…), especially in regards to using M
for both months and minutes (it's still unambiguous due to the T
separator), but I feel like we should support it, because it's an official part of ISO 8601.
However, I kinda like the idea of additionally supporting our own format. Like, why not support 12:34:56
(or similar, just a quick idea) to specify a 12 hours, 34 minutes, 56 seconds duration? Why not also support
(space) as alternative to T
to separate times? The designators could ignore case (i.e. also supporting 7d
for 7 days), and we could allow a space after each term. This might go too far though. In general, I'm not sure whether we should put things into the "official" P
designator, or rather additionally support our own (like the suggested D
, e.g. D 3y 6m 4d 12:30:05
?). Is there maybe another common or even formalized/standardized (like another RFC; I did some research, unfortunately I didn't find anything official or quasi-official) format? WDYT?
# ISO week date: YYYY-Www or YYYY-Www-D | ||
(?P<isoweek_year>\d{4})-W(?P<isoweek_week>\d{2})(?:-(?P<isoweek_day>\d))? | ||
| # Ordinal date: YYYY-DDD | ||
(?P<ordinal_year>\d{4})-(?P<ordinal_day>\d{3}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a bit silly, but it just got me thinking: Could we support ordinal days with wildcard years? 🤔 Same for ISO weeks, possibly even *-W*-5
(wow, that looks crazy 😆) to match all archives created on a Friday? Not sure whether users would use it, but if it's possible? WDYT?
| # Ordinal date: YYYY-DDD | ||
(?P<ordinal_year>\d{4})-(?P<ordinal_day>\d{3}) | ||
| # Unix epoch | ||
@(?P<epoch>\d+) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also supporting fractal seconds here would be amazing! ❤️
Side note: I might read the regex wrong, but this also means @1745577106[Europe/Berlin]
(or any other TZ format) is supported? AFAIK Unix timestamps are UTC per definition, right? Or is the TZ info used later for something else?
Also supports: | ||
TIMESTAMP/TIMESTAMP | ||
TIMESTAMP/DURATION | ||
DURATION/TIMESTAMP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! ❤️
Just DURATION
(i.e. without a TIMESTAMP
) isn't supported yet, right? Just specifying a duration is helpful to match the latest archives relative to "now" (which could be another useful keyword).
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8776 +/- ##
==========================================
- Coverage 81.89% 81.62% -0.28%
==========================================
Files 74 74
Lines 13324 13517 +193
Branches 1968 2008 +40
==========================================
+ Hits 10912 11033 +121
- Misses 1750 1802 +52
- Partials 662 682 +20 ☔ View full report in Codecov by Sentry. |
before doing further changes, please rebase on current master branch to get the cython workaround. otherwise, builds will fail. |
Guess I'ld like to merge this for next beta, can we finish this until then? |
This PR adds preliminary support for matching ISO 8601 timestamps with the
date:
archive filter, and intends to begin addressing the requirements of #8715.Timestamps (except for Unix epoch forms) are currently assumed to be in the user's local timezone and converted to UTC internally. The following formats are currently supported:
YYYY
YYYY-MM
YYYY-MM-DD
YYYY-MM-DDTHH
-> matches a 1-hour intervalYYYY-MM-DDTHH:MM
-> matches a 1-minute intervalYYYY-MM-DDTHH:MM:SS
-> matches a 1-second intervalYYYY-MM-DDTHH:MM:SS.ffff
-> matches an exact timestamp, including fractional seconds@123456789
-> Unix epoch (interpreted as UTC)(Is the fractional-second exact match useful in practice? Feedback welcome on this.)
Still in progress:
date:2025-*-01
)now
,oldest
,newest
, etc