A simple Python script to download files from a website recursively based on specified file extensions. It can download a variety of file types, including images, videos, documents, and more.
- Recursively scrape a website for downloadable files.
- Download files based on specified extensions (or all files if no extensions are specified).
- Support a wide range of file extensions (images, videos, documents, and more).
- Log download progress with color-coded messages for easy tracking.
- Python 3.x
requestslibrarybeautifulsoup4librarycoloramalibrary
You can install the required libraries by running:
pip install -r requirements.txtpython scrapper.py <URL> <extensions> <download_directory><URL>: The URL of the website you want to scrape.<extensions>: A list of file extensions to download. Use*for all file types, or specify extensions like*.jpg|*.png|*.pdf.<download_directory>: The local directory where you want to save the downloaded files.
python scrapper.py https://example.com "*" ./downloadsThis command will download all files from the specified URL.
python scrapper.py https://example.com "*jpg|png" ./downloadsThis command will download only .jpg images and files named logo.png.
- Recursion: The script uses recursion to follow links found on the website and scrape additional pages.
- File Matching: You can specify which files to download by using patterns. Use
*as a wildcard for any extension. - Directory Management: The script will create the download directory if it doesn't exist.
- Error Handling: The script logs any errors that occur during the downloading process and will skip files that cannot be downloaded.
- The script processes HTML pages and looks for all links (
<a href="...">), downloading files from them if they match the specified extensions. - If no extensions are provided, the script will attempt to download all files it finds.
- The script is designed to be simple and easy to use, while also providing useful logging information.
This project is licensed under the MIT License.