IMDB Scraper

This project contains a set of scripts to scrape and process IMDB data. The scripts perform the following tasks:

Fetch URLs from an IMDB search page.
Extract movie IDs from the fetched URLs.
Retrieve detailed movie information using the extracted IDs.

Scripts Overview

1. `args.py`

Defines command-line arguments for the scripts.

2. `utils.py`

Contains utility functions for parsing URLs, processing movie IDs, and interacting with web elements.

Functions:

parse_url_csv(file_path, project_name): Parses a CSV file to extract and clean IMDB IDs.
fetch_and_parse_movie_info(movie_id, imdb, request_interval): Fetches and parses movie information by ID.
process_movie_ids(movie_id_list, final_results, num_workers, imdb, request_interval, project_name): Processes a list of movie IDs and saves results to CSV.
chunk_movie_id_list(movie_id_list, chunk_size): Yields chunks of movie IDs for batch processing.
scroll_to_element(driver, by, value): Scrolls to a web element.
click_element(driver, by, value, retries=3): Clicks a web element with retries.

3. `main.py`

The main script that uses Selenium to scrape IMDB, extract URLs, and process movie information.

Requirements

selenium==4.23.0
chromedriver_autoinstaller==0.6.4
pandas==2.2.2
beautifulsoup4==4.12.3
tqdm==4.66.4
PyMovieDb==0.0.9

How to Run

Set up the environment and install dependencies as described above.
Run the main script:
```
python main.py --url "https://www.imdb.com/search/title/?title_type=tv_series,tv_miniseries" --click_count 50 --project_name "series" --num_workers 4 --request_interval 3
```
- --url: IMDB search URL to scrape.
- --click_count: Number of times to click the "Show 50 more" button.
- --project_name: Prefix for output files.
- --num_workers: Number of concurrent threads for processing.
- --request_interval: Interval between requests in seconds.

The script will output the scraped movie URLs and detailed movie information to CSV files.

Acknowledgements

Special thanks to the PyMovieDb project for providing the IMDb API integration used in this script.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
args.py		args.py
main.py		main.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IMDB Scraper

Scripts Overview

1. `args.py`

2. `utils.py`

3. `main.py`

Requirements

How to Run

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

furkantrky/IMDB-Data-Scraper

Folders and files

Latest commit

History

Repository files navigation

IMDB Scraper

Scripts Overview

1. args.py

2. utils.py

3. main.py

Requirements

How to Run

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. `args.py`

2. `utils.py`

3. `main.py`

Packages