Releases · Togeee12/web-scraper-project

v1.0.0 – Initial Release

Overview

First public release of the Web Scraping Tool! A simple, Python-based CLI tool to extract links, emails, social media links, author names, and phone numbers.

Features

Extracts:
- Links
- Email addresses
- Social media profiles (Facebook, Twitter, Instagram, etc.)
- Author names
- Phone numbers (country-specific)
- Images (with optional download)
- Documents (PDF, DOCX, XLSX, etc.)
- Tables (with optional CSV export)
- Metadata (title, meta tags)
Output to terminal (with colors) or file
Supports TXT, JSON, CSV, Markdown, Excel, and SQLite formats
Recursive and parallel scraping
Live preview mode
Scheduled scraping
Data filtering and processing (deduplication, sorting)
Modular codebase for easy extension

Usage

Clone the repo:

git clone https://github.com/Togeee12/web-scraper-project.git
cd web-scraper-project

Install dependencies:
```
pip install -r requirements.txt
```

Run the script:

python main.py --url <website_url> --output <terminal|file> [options]

Key Arguments:

--url (required): Website URL to scrape.
--output: Output mode (terminal or file).
--format: File format (txt, json, csv, md, xlsx, sqlite).
--filename: Output filename.
--country: Country code for phone numbers (default: US).
--depth: Depth for recursive scraping.
--recursive: Enable recursive scraping.
--parallel: Enable parallel scraping.
--urls: List of URLs for parallel scraping.
--max-workers: Number of parallel workers.
--schedule: Schedule scraping every X hours.
--schedule-output: Output file for scheduled scraping.
--filter-keyword: Filter results by keyword.
--filter-regex: Filter results by regex pattern.
--process: Deduplicate and sort data.
--download-images: Download images locally.
--live-preview: Enable live preview mode.

🛠️ Dependencies

beautifulsoup4
requests
colorama
phonenumbers
tqdm
pandas (for Excel/CSV export)
openpyxl (for Excel export)
schedule (for scheduled scraping)

Install all dependencies with:

pip install -r requirements.txt

🙏 Acknowledgments

Created by Togeee12
Thanks to the developers of the Python libraries

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

v1.0.0 – Initial Release

Overview

Features

Usage

🛠️ Dependencies

🙏 Acknowledgments

Uh oh!

Releases: Togeee12/web-scraper-project

v1.0.0

v1.0.0 – Initial Release

Overview

Features

Usage

🛠️ Dependencies

🙏 Acknowledgments

Uh oh!