-
-
Notifications
You must be signed in to change notification settings - Fork 51
feat: Apply saved workflow settings to current crawl #2514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could use some documentation and/or help text to help explain what updates will actually be applied to a running crawl.
Not knowing what was supported, I started a large domain crawl of a site with one hop out and then attempted to set a lower page limit while the crawl was running, which was not applied and did not prevent additional pages from being added to the queue.
I can imagine other users might have a similar experience and think the feature isn't working as intended.
@ikreymer did you get a chance to look into this? |
Not yet, will try tomorrow, but removed it from milestone in case we don't get to it yet. |
Made a change that will cause the crawler to restart when Update Crawl is selected. However, it looks like even though the change in limit is applied to the config, the crawler doesn't actually remove URLs from the queue in the page limit is lowered, that may require a crawler change (on startup, check if size of queue exceeds limit). |
…ew limit, taking into account finished/failed URLs useful to support dynamically lowering pageLimit when restarting a crawl fixes issue raised in webrecorder/browsertrix#2514
webrecorder/browsertrix-crawler#821 adds support for lowering pageLimit / removing URLs already queued when shorter limit is set. |
… that the running crawl, if any, should be updated the response includes 'updatedRunning' boolean which is set to true if a running crawl has been updated option is ignored if there is no running crawl
6c91ab0
to
2fd2799
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested and now working as expected, including for lowering limit with webrecorder/browsertrix-crawler#821 crawler patch. Nice work and thank you!
useful to support dynamically lowering pageLimit when restarting a crawl fixes issue raised in webrecorder/browsertrix#2514
Resolves #2366
Changes
Allows users to update current crawl with newly saved workflow settings.
Manual testing
Screenshots