Website URL Excel Task

This package contains a Python script for both tasks:

Task 1: Broken links report

Keep the first column header as:

URL

Example rows:

Open terminal / command prompt:

pip install requests beautifulsoup4 openpyxl

python scraper_task.py --task broken_links --input input_urls.xlsx --output broken_links_report.xlsx

python scraper_task.py --task download_images --input input_urls.xlsx --output downloaded_images

Some websites block scraping or block HEAD requests. The script tries GET if needed.
Relative links like /about are automatically converted to full URLs.
mailto:, tel:, javascript:, and #anchor links are ignored.
If a page itself does not open, the script adds that page as an error in the report.

Excel columns: