Skip to content

German weekday "Montag" (Monday) only works with 'PREFER_DATES_FROM': 'future'? #1262

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
helfrichp opened this issue Apr 2, 2025 · 2 comments

Comments

@helfrichp
Copy link

Found this strange behaviour, German weekday "Montag" (Monday) only works with 'PREFER_DATES_FROM': 'future'.

Not sure what's going on here, all other weekdays work as expected though.

    all_found_dates = dateparser.search.search_dates("Montag", languages=['de'], settings={'PREFER_DATES_FROM': 'current_period'})
    print(all_found_dates)
    all_found_dates = dateparser.search.search_dates("Montag", languages=['de'], settings={'PREFER_DATES_FROM': 'past'})
    print(all_found_dates)
    all_found_dates = dateparser.search.search_dates("Montag", languages=['de'], settings={'PREFER_DATES_FROM': 'future'})
    print(all_found_dates)

Output:
[('Montag', datetime.datetime(2025, 12, 31, 0, 0))]
[('Montag', datetime.datetime(2025, 12, 31, 0, 0))]
[('Montag', datetime.datetime(2025, 4, 7, 0, 0))]

@synrg
Copy link

synrg commented Apr 4, 2025

Problems with parsing a weekday in the past at the beginning of the month are not language-specific. I show a more comprehensive reproducer below with English.

The date on which the test is run is relevant to the problem. Parsing a weekday that hasn't yet occurred in the current month unexpectedly returns days in the future instead of days in the past with settings={'PREFER_DATES_FROM': 'past'}

Our workaround is to revert to an older version. The last release in which a date in the past is returned when input is a weekday that hasn't happened yet in the current month is 1.1.8.

dateparser 1.20 and 1.2.1

Note: April 1, 2025 is a Tuesday. Today (the date of this test) is a Friday. The only days that work are Tuesday, Wednesday, and Thursday.

>>> import dateparser
>>> weekdays = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
>>> kwargs = {'settings': {'PREFER_DATES_FROM': 'past'}}
>>> [dateparser.parse(wday, **kwargs) for wday in weekdays]
[datetime.datetime(2025, 4, 30, 0, 0), datetime.datetime(2025, 12, 31, 0, 0), datetime.datetime(2025, 4, 1, 0, 0), datetime.datetime(2025, 4, 2, 0, 0), datetime.datetime(2025, 4, 3, 0, 0), datetime.datetime(2025, 4, 28, 0, 0), datetime.datetime(2025, 4, 29, 0, 0)]

dateparser 1.1.8

The parsed dates for each day of the week are all in the past as expected, either past weekdays in the current month (Tuesday through Thursday) or else past days for the previous month (Friday through Monday).

>>> import dateparser
>>> weekdays = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
>>> kwargs = {'settings': {'PREFER_DATES_FROM': 'past'}}
>>> [dateparser.parse(wday, **kwargs) for wday in weekdays]
[datetime.datetime(2025, 3, 30, 0, 0), datetime.datetime(2025, 3, 31, 0, 0), datetime.datetime(2025, 4, 1, 0, 0), datetime.datetime(2025, 4, 2, 0, 0), datetime.datetime(2025, 4, 3, 0, 0), datetime.datetime(2025, 3, 28, 0, 0), datetime.datetime(2025, 3, 29, 0, 0)]

@synrg
Copy link

synrg commented Apr 4, 2025

This may be part of the same issue so I mention it here in case the two problems share a common root cause.

We have a similar problem with dates in the future being returned for months that have not yet occurred in the current year when we request past dates. I show both last and first day of the month preferred, as that matches our use cases (for implementing 'since' and 'until' qualifiers in a query language) and affects the outcome.

That is, the last day of April 2025 hasn't yet occurred, yet currently when we ask for a date in the past with last day of the month preferred for 'April', it returns April 30, 2025, a date in the future. Compare with the second test with first day of the month preferred and it returns the expected result, April 30, 2024. I don't know if that's expected behaviour or not. It's not what we expected, though.

dateparser 1.1.7, 1.1.8, 2.1.0, 2.1.0

>>> import dateparser
>>> months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
>>> kwargs = {'settings': {'PREFER_DATES_FROM': 'past', 'PREFER_DAY_OF_MONTH': 'last'}}
>>> [dateparser.parse(month, **kwargs) for month in months]
[datetime.datetime(2025, 1, 31, 0, 0), datetime.datetime(2025, 2, 28, 0, 0), datetime.datetime(2025, 3, 31, 0, 0), datetime.datetime(2025, 4, 30, 0, 0), datetime.datetime(2024, 5, 31, 0, 0), datetime.datetime(2024, 6, 30, 0, 0), datetime.datetime(2024, 7, 31, 0, 0), datetime.datetime(2024, 8, 31, 0, 0), datetime.datetime(2024, 9, 30, 0, 0), datetime.datetime(2024, 10, 31, 0, 0), datetime.datetime(2024, 11, 30, 0, 0), datetime.datetime(2024, 12, 31, 0, 0)]
>>> kwargs = {'settings': {'PREFER_DATES_FROM': 'past', 'PREFER_DAY_OF_MONTH': 'first'}}
>>> [dateparser.parse(month, **kwargs) for month in months]
[datetime.datetime(2025, 1, 1, 0, 0), datetime.datetime(2025, 2, 1, 0, 0), datetime.datetime(2025, 3, 1, 0, 0), datetime.datetime(2025, 4, 1, 0, 0), datetime.datetime(2024, 5, 1, 0, 0), datetime.datetime(2024, 6, 1, 0, 0), datetime.datetime(2024, 7, 1, 0, 0), datetime.datetime(2024, 8, 1, 0, 0), datetime.datetime(2024, 9, 1, 0, 0), datetime.datetime(2024, 10, 1, 0, 0), datetime.datetime(2024, 11, 1, 0, 0), datetime.datetime(2024, 12, 1, 0, 0)]

I gave up testing older versions prior to 1.1.7. It looks like this may have always been this way, and in any case, we would not want to revert to a release older than that if any of them ever worked the way we think it should work.

Although I don't want to get into reproducers for it, it seems like parsing an end date that is supposed to be a past end date when specifying an unqualified year, e.g. 2025, is also awkward in the current implementation of dateparser.

workaround

The only workaround I can think of off the top of my head is:

  • parse the date portion of the user's input (e.g. 'April' for 'since April' or 'until April') with preferred day of month determined by the qualifier 'since' ('first') or 'until' ('last')
  • for the 'until' case, since the parser may incorrectly return a date in the future
    • compare the resulting date with the current date
    • if the result is in the future, try parsing the input again with settings={'PREFER_DATES_FROM': 'past', 'PREFER_DAY_OF_MONTH': 'current'}

This could've been a better workaround for the weekday case as well except for the bizarre treatment of Monday as Dec 31, 2025 which is unlike all of the others. Therefore, we'll at least pin dateparser at 1.1.8 for now for that reason, and may also implement the workaround above to guarantee until dates for the current month or year are never dates in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants