Releases: Togeee12/web-scraper-project
Releases · Togeee12/web-scraper-project
v1.0.0
v1.0.0 – Initial Release
Overview
First public release of the Web Scraping Tool! A simple, Python-based CLI tool to extract links, emails, social media links, author names, and phone numbers.
Features
- Extracts:
- Links
- Email addresses
- Social media profiles (Facebook, Twitter, Instagram, etc.)
- Author names
- Phone numbers (country-specific)
- Images (with optional download)
- Documents (PDF, DOCX, XLSX, etc.)
- Tables (with optional CSV export)
- Metadata (title, meta tags)
- Output to terminal (with colors) or file
- Supports TXT, JSON, CSV, Markdown, Excel, and SQLite formats
- Recursive and parallel scraping
- Live preview mode
- Scheduled scraping
- Data filtering and processing (deduplication, sorting)
- Modular codebase for easy extension
Usage
- Clone the repo:
git clone https://github.com/Togeee12/web-scraper-project.git cd web-scraper-project - Install dependencies:
pip install -r requirements.txt
- Run the script:
python main.py --url <website_url> --output <terminal|file> [options]
Key Arguments:
--url(required): Website URL to scrape.--output: Output mode (terminalorfile).--format: File format (txt,json,csv,md,xlsx,sqlite).--filename: Output filename.--country: Country code for phone numbers (default:US).--depth: Depth for recursive scraping.--recursive: Enable recursive scraping.--parallel: Enable parallel scraping.--urls: List of URLs for parallel scraping.--max-workers: Number of parallel workers.--schedule: Schedule scraping every X hours.--schedule-output: Output file for scheduled scraping.--filter-keyword: Filter results by keyword.--filter-regex: Filter results by regex pattern.--process: Deduplicate and sort data.--download-images: Download images locally.--live-preview: Enable live preview mode.
🛠️ Dependencies
- beautifulsoup4
- requests
- colorama
- phonenumbers
- tqdm
- pandas (for Excel/CSV export)
- openpyxl (for Excel export)
- schedule (for scheduled scraping)
Install all dependencies with:
pip install -r requirements.txt🙏 Acknowledgments
- Created by Togeee12
- Thanks to the developers of the Python libraries