Skip to content

Hang fetching tree for www.govern.ad #98

@donbowman

Description

@donbowman

Site URL

https://www.govern.ad/

Description

The robots.txt suggests http:// for the sitemap, rather than htttps:
when fetching the root w/ http:, you get a 301 w/ Location of https:, but not for the sitemap
It hangs since http://www.govern.ad/sitemap-index.xml does not complete, and there appears to be no timeout (it retries it)

It then gets stuck fetching the sitemap with parameters that are a guid, over and over, forever like:

DEBUG:usp.helpers:Testing if URL 'https://www.govern.ad/sitemap.xml?p_l_id=3072&layoutUuid=6df02011-37ee-5914-c1ac-0320e3be7f3c&groupId=1898932&privateLayout=false' is HTTP(s) URL
INFO:usp.fetch_parse:Fetching level 1 sitemap from https://www.govern.ad/sitemap.xml?p_l_id=3072&layoutUuid=6df02011-37ee-5914-c1ac-0320e3be7f3c&groupId=1898932&privateLayout=false...
INFO:usp.helpers:Fetching URL https://www.govern.ad/sitemap.xml?p_l_id=3072&layoutUuid=6df02011-37ee-5914-c1ac-0320e3be7f3c&groupId=1898932&privateLayout=false...
DEBUG:urllib3.connectionpool:https://www.govern.ad:443 "GET /sitemap.xml?p_l_id=3072&layoutUuid=6df02011-37ee-5914-c1ac-0320e3be7f3c&groupId=1898932&privateLayout=false HTTP/1.1" 200 0
DEBUG:usp.fetch_parse:Response URL is https://www.govern.ad/sitemap.xml?p_l_id=3072&layoutUuid=6df02011-37ee-5914-c1ac-0320e3be7f3c&groupId=1898932&privateLayout=false
INFO:usp.fetch_parse:Parsing sitemap from URL https://www.govern.ad/sitemap.xml?p_l_id=3072&layoutUuid=6df02011-37ee-5914-c1ac-0320e3be7f3c&groupId=1898932&privateLayout=false...
DEBUG:usp.fetch_parse:Parent URLs is {'http://www.govern.ad/sitemap-index.xml', 'https://www.govern.ad/sitemap.xml'}
DEBUG:usp.helpers:Testing if URL 'https://www.govern.ad/sitemap.xml?p_l_id=3076&layoutUuid=eb0783e2-72fa-8783-df88-d57ba12196db&groupId=1898932&privateLayout=false' is HTTP(s) URL
INFO:usp.fetch_parse:Fetching level 1 sitemap from https://www.govern.ad/sitemap.xml?p_l_id=3076&layoutUuid=eb0783e2-72fa-8783-df88-d57ba12196db&groupId=1898932&privateLayout=false...
INFO:usp.helpers:Fetching URL https://www.govern.ad/sitemap.xml?p_l_id=3076&layoutUuid=eb0783e2-72fa-8783-df88-d57ba12196db&groupId=1898932&privateLayout=false...

there is a valid sitemap, at https://www.govern.ad/sitemap-index.xml, but the package doesn't find it.

See attached

output.txt

Environment

  • Python version: Python 3.12.3
  • USP version: 1.4.0

Log and Output Files

  • Output log:
  • Output text:

Metadata

Metadata

Assignees

No one assigned

    Labels

    siteIssues relating to a specific site

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions