Skip to content

Conversation

hjr3
Copy link
Contributor

@hjr3 hjr3 commented Nov 9, 2019

Fixes #40

// seconds. This ensures the order we crawl URLs is random; otherwise, if
// we parse a sitemap, we could get stuck crawling one host for hours.
delay = - Math.random() * YEAR_MS;
delay = - Math.random() * this._recrawlInMs;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this has the intended effect. Notice that delay is negative here. This is simply to randomize new URLs that come onto the queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

How to periodically crawl again

2 participants