-
Notifications
You must be signed in to change notification settings - Fork 69
Open
Labels
siteIssues relating to a specific siteIssues relating to a specific site
Description
Site URL
Description
The robots.txt suggests http:// for the sitemap, rather than htttps:
when fetching the root w/ http:, you get a 301 w/ Location of https:, but not for the sitemap
It hangs since http://www.govern.ad/sitemap-index.xml does not complete, and there appears to be no timeout (it retries it)
It then gets stuck fetching the sitemap with parameters that are a guid, over and over, forever like:
DEBUG:usp.helpers:Testing if URL 'https://www.govern.ad/sitemap.xml?p_l_id=3072&layoutUuid=6df02011-37ee-5914-c1ac-0320e3be7f3c&groupId=1898932&privateLayout=false' is HTTP(s) URL
INFO:usp.fetch_parse:Fetching level 1 sitemap from https://www.govern.ad/sitemap.xml?p_l_id=3072&layoutUuid=6df02011-37ee-5914-c1ac-0320e3be7f3c&groupId=1898932&privateLayout=false...
INFO:usp.helpers:Fetching URL https://www.govern.ad/sitemap.xml?p_l_id=3072&layoutUuid=6df02011-37ee-5914-c1ac-0320e3be7f3c&groupId=1898932&privateLayout=false...
DEBUG:urllib3.connectionpool:https://www.govern.ad:443 "GET /sitemap.xml?p_l_id=3072&layoutUuid=6df02011-37ee-5914-c1ac-0320e3be7f3c&groupId=1898932&privateLayout=false HTTP/1.1" 200 0
DEBUG:usp.fetch_parse:Response URL is https://www.govern.ad/sitemap.xml?p_l_id=3072&layoutUuid=6df02011-37ee-5914-c1ac-0320e3be7f3c&groupId=1898932&privateLayout=false
INFO:usp.fetch_parse:Parsing sitemap from URL https://www.govern.ad/sitemap.xml?p_l_id=3072&layoutUuid=6df02011-37ee-5914-c1ac-0320e3be7f3c&groupId=1898932&privateLayout=false...
DEBUG:usp.fetch_parse:Parent URLs is {'http://www.govern.ad/sitemap-index.xml', 'https://www.govern.ad/sitemap.xml'}
DEBUG:usp.helpers:Testing if URL 'https://www.govern.ad/sitemap.xml?p_l_id=3076&layoutUuid=eb0783e2-72fa-8783-df88-d57ba12196db&groupId=1898932&privateLayout=false' is HTTP(s) URL
INFO:usp.fetch_parse:Fetching level 1 sitemap from https://www.govern.ad/sitemap.xml?p_l_id=3076&layoutUuid=eb0783e2-72fa-8783-df88-d57ba12196db&groupId=1898932&privateLayout=false...
INFO:usp.helpers:Fetching URL https://www.govern.ad/sitemap.xml?p_l_id=3076&layoutUuid=eb0783e2-72fa-8783-df88-d57ba12196db&groupId=1898932&privateLayout=false...
there is a valid sitemap, at https://www.govern.ad/sitemap-index.xml, but the package doesn't find it.
See attached
Environment
- Python version: Python 3.12.3
- USP version: 1.4.0
Log and Output Files
- Output log:
- Output text:
Metadata
Metadata
Assignees
Labels
siteIssues relating to a specific siteIssues relating to a specific site