A PHP-based web crawler for collecting structured metadata from Taaghche’s digital bookstore, specifically targeting books by a given publisher. Automatically handles pagination and exports comprehensive book details to JSON for further use in data analysis, archival, or integration with other systems.
This is a simple PHP-based crawler for fetching all books from Taaghche by a specific publisher. The result is saved as a JSON file (
taaghche_books.json) for further analysis or processing.
- Crawls all books from Taaghche with a specific publisher filter.
- Handles pagination using the
nextOffsetvalue. - Outputs a full list of books to a local JSON file.
- Includes metadata like title, author, rating, price, and cover image URL.
-
Clone this repository:
git clone https://github.com/BaseMax/taaghche-book-crawler.git cd taaghche-book-crawler -
Run the PHP script:
php taaghche.com.php
The result will be saved to:
taaghche_books.json
💡 The default publisher ID is 645 (انتشارات). You can modify the filters parameter in the script to target other publishers or criteria.
Each book in taaghche_books.json includes fields such as:
- id
- title
- authors
- price
- publisher
- rating
- coverUri
- publishDate
- And much more depending on what Taaghche provides in the API response.
MIT License
© 2025 Max Base