-
Notifications
You must be signed in to change notification settings - Fork 113
fix: fix logic to compute service window #2029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix: fix logic to compute service window #2029
Conversation
|
I see we're getting a stack overflow for this dataset. I'll have to take a closer look. |
ce96443 to
bd4b3ff
Compare
|
@davidgamez @jcpitre This is ready for review :-) |
|
Adding the acceptance test manually due to #1960 📝 Acceptance Test Report📋 Summary✅ The rule acceptance has passed for commit 057ab10 📊 Notices ComparisonNew Errors (0 out of 1886 datasets, ~0%) ✅No changes were detected due to the code change. Dropped Errors (0 out of 1886 datasets, ~0%) ✅No changes were detected due to the code change. New Warnings (0 out of 1886 datasets, ~0%) ✅No changes were detected due to the code change. Dropped Warnings (0 out of 1886 datasets, ~0%) ✅No changes were detected due to the code change. 🛡️ Corruption Check35 out of 1921 sources (~2 %) are corrupted.
⏱️ Performance Assessment📈 Validation TimeAssess the performance in terms of seconds taken for the validation process.
📜 Memory Consumption
|
davidgamez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! I added a few comments before final approval.
main/src/main/java/org/mobilitydata/gtfsvalidator/reportsummary/model/ServiceWindow.java
Outdated
Show resolved
Hide resolved
main/src/main/java/org/mobilitydata/gtfsvalidator/util/FuncUtil.java
Outdated
Show resolved
Hide resolved
main/src/main/java/org/mobilitydata/gtfsvalidator/reportsummary/model/ServiceWindow.java
Show resolved
Hide resolved
|
@skalexch can you take a look at this PR for product approval? Thanks! |
|
Adding the acceptance test manually due to #1960 The test is failing due to a high rate of corrupted files, with no concerns regarding notices. This should be fixed after #2024 📝 Acceptance Test Report📋 Summary❌ The rule acceptance test has failed for commit cdd15be 📊 Notices ComparisonNew Errors (0 out of 1886 datasets, ~0%) ✅No changes were detected due to the code change. Dropped Errors (0 out of 1886 datasets, ~0%) ✅No changes were detected due to the code change. New Warnings (0 out of 1886 datasets, ~0%) ✅No changes were detected due to the code change. Dropped Warnings (0 out of 1886 datasets, ~0%) ✅No changes were detected due to the code change. 🛡️ Corruption Check45 out of 1931 sources (~2 %) are corrupted.
⏱️ Performance Assessment📈 Validation TimeAssess the performance in terms of seconds taken for the validation process.
📜 Memory Consumption
|
|
I see a large increase in memory consumption for the "Minimum in References Reports" metric but other PRs (this one, for example) have the same thing, so I assume it's nothing to worry about? |
I agree; this increase doesn't look like a side effect of this PR. |
davidgamez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not having submitted this earlier, I wrote the comment on Wednesday and did not submit the review. I just wrote one question regarding the logic.
| return Optional.empty(); | ||
| } | ||
|
|
||
| Map<String, Set<LocalDate>> removedDaysByServiceId = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sylvansson @davidgamez one question, the function seems to create removedDaysByServiceId by using only GtfsCalendarDate (calendar_dates.txt). However, there might be services in calendar.txt that don't have a removed date in calendar_dates.txt.
So I think the way to check this is by making sure that for each removed date, it is removed for every service_id that figures either in calendar.txt or in calendar_dates.txt (as ADDED SERVICE: exception_type=1).
Eg:
calendar.txt: service_id s_1 from 1-5-2025 to 30-5-2025.
calendar_dates.txt:
service_id s_1 removed on 1-5-2025
service_id s_3 added on 24-5-2025
service_id s_3 removed on 31-5-2025
The service window is 2-5-2025 to 31-5-2025 because:
- May 1st is removed for all services that exist on May 1st (
s_1only) s_3was added for May 24th and removed on May 31st. So no service operates on May 31st. Ifs_1were to end on May 31st then the service window would include that date.
I suggest that for each date, we count the total unique removed service_ids, then compare it with the count of unique service_ids that operate on that date (both from calendar.txt and the added services in calendar_dates.txt), but there might be better ways to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @skalexch. You are right. The spec doesn't force all service IDs to be defined in the calendar.txt and calendar_dates.txt when both files are defined. In this case, we can have service IDs from calendar.txt that are not present in the calendar_dates and will have a service on a day removed by a calendar_date entry.
Following this logic, I agree that we need to keep a total of "all unique service IDs" per day, so we make sure that all services are not present to mark the date as "removed." The "all unique service IDs" can be computed by looking at the trips.txt file.
In addition, we need to ensure that the added service "expands" the start and end dates of the service window. So, we must iterate over all additions and set the start and end dates accordingly. This should be done after computing the start and end dates from all calendar ranges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I'll try to get this fixed sometime next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I'll try to get this fixed sometime next week.
Thanks!
| return Optional.empty(); | ||
| } | ||
|
|
||
| Map<String, Set<LocalDate>> removedDaysByServiceId = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @skalexch. You are right. The spec doesn't force all service IDs to be defined in the calendar.txt and calendar_dates.txt when both files are defined. In this case, we can have service IDs from calendar.txt that are not present in the calendar_dates and will have a service on a day removed by a calendar_date entry.
Following this logic, I agree that we need to keep a total of "all unique service IDs" per day, so we make sure that all services are not present to mark the date as "removed." The "all unique service IDs" can be computed by looking at the trips.txt file.
In addition, we need to ensure that the added service "expands" the start and end dates of the service window. So, we must iterate over all additions and set the start and end dates accordingly. This should be done after computing the start and end dates from all calendar ranges.
Summary:
Closes #2017
Expected behavior:
Before:

After:

I've only tested with the mdb-2138 feed for now. If I understand correctly the acceptance tests don't validate the feed metadata so I still need to check a few datasets manually⚠️
Please make sure these boxes are checked before submitting your pull request - thanks!
gradle testto make sure you didn't break anything