Skip to content

Add preliminary support for ISO-8601 timestamps via date: archive match pattern (#8715) #8776

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
282d70c
Add preliminary support for ISO-8601 timestamps (no timezones at the …
c-herz Apr 19, 2025
db46cdb
reformatted to pass style checks
degabe Apr 21, 2025
4363bf7
Applied recommended changes from ThomasWald, still working as intende…
degabe Apr 21, 2025
69e8608
fix bug with local timezone attachment not correctly respecting DST
c-herz Apr 21, 2025
5c20d8f
Reformatted for consistency with code style guide
c-herz Apr 22, 2025
6f1bcd4
Added basic test suite for ISO-8601 and Unix timestamp matching
c-herz Apr 22, 2025
4060e94
Merge remote-tracking branch 'origin/dateFilterImprov' into datefilter
c-herz Apr 22, 2025
e9a8c5f
add day-precision filter test for `date:YYYY-MM-DD`
c-herz Apr 22, 2025
470758d
support timezone suffixes in date: patterns and add tests
c-herz Apr 22, 2025
df2d33d
Wildcard working. Done some manual testing, will focus on more rigoro…
degabe Apr 23, 2025
870bf7a
add tests for wildcard support in date: archive match patterns; refor…
c-herz Apr 25, 2025
461df75
fix bug with wildcards in date: match patterns not respecting supplie…
c-herz Apr 25, 2025
9553c35
remove stray testfile.txt
c-herz Apr 25, 2025
409733b
refactor date: pattern parser to use structured bottom-up regex, per …
c-herz Apr 25, 2025
de03806
refactor date: pattern parsing to use helper functions for datetime c…
c-herz Apr 25, 2025
796981c
add explicit time interval matching in date: archive match pattern (w…
c-herz Apr 25, 2025
7b8a194
add duration-based interval support for date: archive match patterns;…
c-herz Apr 25, 2025
8e3f1e4
add support for keyword-based date intervals in archive date: matchin…
c-herz Apr 25, 2025
904853d
refactor time.py: rename internal functions for clarity and consistency
c-herz Apr 25, 2025
6032c4a
add support for ISO week-date and ordinal-date matching in date: arch…
c-herz Apr 25, 2025
9cb5e5f
enhance compile_date_pattern docstring: clarify TIMESTAMP and DURATIO…
c-herz Apr 25, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 97 additions & 0 deletions src/borg/helpers/time.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,3 +185,100 @@ def isoformat(self):
def archive_ts_now():
"""return tz-aware datetime obj for current time for usage as archive timestamp"""
return datetime.now(timezone.utc) # utc time / utc timezone

class DatePatternError(ValueError):
"""Raised when a date: archive pattern cannot be parsed."""


def local(dt: datetime) -> datetime:
"""Attach the system local timezone to naive dt without converting."""
if dt.tzinfo is None:
dt = dt.replace(tzinfo=datetime.now().astimezone().tzinfo)
return dt


def exact_predicate(dt: datetime):
"""Return predicate matching archives whose ts equals dt (UTC)."""
dt_utc = local(dt).astimezone(timezone.utc)
return lambda ts: ts == dt_utc


def interval_predicate(start: datetime, end: datetime):
start_utc = local(start).astimezone(timezone.utc)
end_utc = local(end).astimezone(timezone.utc)
return lambda ts: start_utc <= ts < end_utc


def compile_date_pattern(expr: str):
"""
Turn a date: expression into a predicate ts->bool.
Supports:
1) Full ISO‑8601 timestamps with minute (and optional seconds/fraction)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So HH:MM:SS would be an interval of 1s?

2) Hour-only: YYYY‑MM‑DDTHH -> interval of 1 hour
3) Minute-only: YYYY‑MM‑DDTHH:MM -> interval of 1 minute
4) YYYY, YYYY‑MM, YYYY‑MM‑DD -> day/month/year intervals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to list all patterns, please keep the list sorted by length of interval and use 1 pattern/interval per line.

5) Unix epoch (@123456789) -> exact match
Naive inputs are assumed local, then converted into UTC.
TODO: verify working for fractional seconds; add timezone support.
"""
expr = expr.strip()

# 1) Full timestamp (with fraction)
full_re = re.compile(r"^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+")
if full_re.match(expr):
dt = parse_local_timestamp(expr, tzinfo=timezone.utc)
return exact_predicate(dt) # no interval, since we have a fractional timestamp

# 2) Seconds-only
second_re = re.compile(r"^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}$")
if second_re.match(expr):
start = parse_local_timestamp(expr, tzinfo=timezone.utc)
return interval_predicate(start, start + timedelta(seconds=1))

# 3) Minute-only
minute_re = re.compile(r"^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}$")
if minute_re.match(expr):
start = parse_local_timestamp(expr + ":00", tzinfo=timezone.utc)
return interval_predicate(start, start + timedelta(minutes=1))

# 4) Hour-only
hour_re = re.compile(r"^\d{4}-\d{2}-\d{2}T\d{2}$")
if hour_re.match(expr):
start = parse_local_timestamp(expr + ":00:00", tzinfo=timezone.utc)
return interval_predicate(start, start + timedelta(hours=1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just use 1 regex with group names ((?P<name>...) that covers 1) .. 4) and also YYYY, YYYY-MM, YYYY-MM-DD cases from below.

After a single m = re.match(regex, expr), you can check m.groupdict() in the right order (fraction, S, M, H, d, m, y) to determine which case you have.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use re.VERBOSE so you can have a multi-line, commented regex for this.



# Unix epoch (@123456789) - Note: We don't support fractional seconds here, since Unix epochs are almost always whole numbers.
if expr.startswith("@"):
try:
epoch = int(expr[1:])
except ValueError:
raise DatePatternError(f"invalid epoch: {expr!r}")
start = datetime.fromtimestamp(epoch, tz=timezone.utc)
return interval_predicate(start, start + timedelta(seconds=1)) # match within the second

# Year/Year-month/Year-month-day
parts = expr.split("-")
try:
if len(parts) == 1: # YYYY
year = int(parts[0])
start = datetime(year, 1, 1)
end = datetime(year + 1, 1, 1)

elif len(parts) == 2: # YYYY‑MM
year, month = map(int, parts)
start = datetime(year, month, 1)
end = offset_n_months(start, 1)

elif len(parts) == 3: # YYYY‑MM‑DD
year, month, day = map(int, parts)
start = datetime(year, month, day)
end = start + timedelta(days=1)

else:
raise DatePatternError(f"unrecognised date: {expr!r}")

except ValueError as e:
raise DatePatternError(str(e)) from None

return interval_predicate(start, end)
9 changes: 8 additions & 1 deletion src/borg/manifest.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
from .constants import * # NOQA
from .helpers.datastruct import StableDict
from .helpers.parseformat import bin_to_hex, hex_to_bin
from .helpers.time import parse_timestamp, calculate_relative_offset, archive_ts_now
from .helpers.time import parse_timestamp, calculate_relative_offset, archive_ts_now, compile_date_pattern, DatePatternError
from .helpers.errors import Error, CommandError
from .item import ArchiveItem
from .patterns import get_regex_from_pattern
Expand Down Expand Up @@ -198,6 +198,13 @@ def _matching_info_tuples(self, match_patterns, match_end, *, deleted=False):
elif match.startswith("host:"):
wanted_host = match.removeprefix("host:")
archive_infos = [x for x in archive_infos if x.host == wanted_host]
elif match.startswith("date:"):
wanted_date = match.removeprefix("date:")
try:
pred = compile_date_pattern(wanted_date)
except DatePatternError as e:
raise CommandError(f"Invalid date pattern: {match} ({e})")
archive_infos = [x for x in archive_infos if pred(x.ts)]
else: # do a match on the name
match = match.removeprefix("name:") # accept optional name: prefix
regex = get_regex_from_pattern(match)
Expand Down
Loading