Skip to content

Conversation

RAMOhio
Copy link

@RAMOhio RAMOhio commented May 8, 2023

Had a case where it was necessary to generate sitemaps using another method as well as use this library (long story)

Modified the code so that if a URL contains "sitemap" and ends with ".xml" it'll assume it's a sitemap, remove it from the URLs, and add it to the sitemap index.

Scenarios:

  • There are under 50,000 URLs and at least one of the URLs is a sitemap: It'll generate the sitemap index containing the sitemap(s) found in the URLs and sitemap-0.xml will contain the URLs (with the sitemap URLs removed).
  • There are over 50,000 URLs and at least one of the URLs is a sitemap: It'll generate the sitemap index and sitemap-#.xml routes like normal (with the sitemap URLs removed), but all of the sitemaps found in the URLs will be added to the sitemap index.
  • There are only sitemaps in the URLs: It'll create the sitemap index with the sitemaps and nothing else.
  • There are no sitemaps in the URLs: It'll behave the same as the package has always behaved.

Notes:

  • Implementation could be improved but it works.
  • If someone uses sitemap-#.xml in their URLs and the package ends up generating that same path, I'm not sure what the behavior would be, I haven't tested that. For my extra sitemaps I use a different route structure than the library uses (recommended approach).
  • Sitemaps in the URLs must be on the base domain (didn't do any parsing to see if it's external)

@Kikobeats
Copy link
Collaborator

Hey @RAMOhio, I will be happy to merge this if you add a test 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants