A Python script designed to gently and responsibly collect publicly available repair manual images and convert them into a single PDF document. This tool respects server load by implementing rate limiting between requests.
This script was created to make repair manual information more readily accessible while being mindful of server resources. It downloads individual page images from a public source and combines them into a single PDF document for easier consumption.
Note on Responsible Usage: This script is designed for convenience and personal use only. It is not intended to circumvent any publishers' impression marketing or revenue streams. Please support the publishers and vendors who provide valuable documentation by purchasing their products and publications when available. This tool should be used responsibly and in accordance with the terms of service of the source websites.
- Downloads individual page images with rate limiting (10-20 second delays between requests)
- Automatically creates a destination folder for downloaded images
- Converts downloaded images into a single PDF document
- Option to clean up individual image files after PDF creation
- Error handling for failed downloads
- Progress tracking during download process
- Requires a source with a sequentially-numbered URL that can be traversed to collect the pages
- Python 3.x
- Required packages (install via
pip install -r requirements.txt
):- requests >= 2.31.0
- Pillow >= 10.2.0
-
Clone this repository or download the source code
-
Install the required dependencies:
pip install -r requirements.txt
Run the script:
python fetch.py
The script will:
- Create a
downloaded_pages
directory if it doesn't exist - Download each page image with appropriate delays
- Combine all images into a single PDF named
Manual.pdf
- Prompt you to delete the individual image files
To download a pinball manual, use the default configuration:
BASE_URL = "https://www.planetarypinball.com/reference/partsmanuals/BLY_Parts_1976/files/assets/mobile/page0001_i2.jpg"
TOTAL_PAGES = 220 # Total number of pages in the manual
The script will then download pages in sequence:
- page-001.jpg
- page-002.jpg
- page-003.jpg etc.
To download a Harley-Davidson service manual, modify the script's configuration:
BASE_URL = "https://www.harley-davidson.com/content/dam/h-d/images/service/service-manuals/2023/2023-softail-service-manual-page-001.jpg"
TOTAL_PAGES = 500 # Adjust based on the manual's page count
The script will then download pages in sequence:
- page-001.jpg
- page-002.jpg
- page-003.jpg etc.
- Individual page images are saved in the
downloaded_pages
directory - The final PDF is saved as
downloaded_pages/Manual.pdf
The script implements a random delay between 10-20 seconds between requests to avoid overwhelming the server. This is a responsible approach to web scraping that respects server resources.
Josh M Created: March 2025 Version: 2.0.0
This project is open source and available under the MIT License.