Skip to content

This is the 2nd project in final projects of concurrency. For project 1 I already made a CLI (Command-Line Interface) concurrent downloader to download multiple websites concurrently using command lines in the terminal. In this 2nd project, i reconstructed the project 1 from scratch on my own with improvements. Read more in benchmark_documents.

License

Notifications You must be signed in to change notification settings

WillyPhan06/Concurrency-Project-2-CLI-Concurrent-Downloader-With-Improvements-On-Features-and-Benchmarks

Repository files navigation

Concurrency-Project-2-CLI-Concurrent-Downloader-With-Improvements-On-Features-and-Benchmarks

This is the 2nd project in final projects of concurrency, for project 1 I already made a CLI (Command-Line Interface) concurrent downloader to download multiple websites concurrently using command lines in the terminal. Check benchmark_documents for more info of my benchmark and analysis. Scroll to near the bottom to see how to use this.

In this 2nd project, i reconstructed the project 1 all from scratch all from my understanding and memory, no help from AI code or other source code, just full understanding and memory from project 1 and in this 2nd project I also fix edge cases, potential bugs, and improvements such as:

1/ Adding timeout to prevent waiting for ever.

2/ Separate the log.txt to different folder to avoid naming confusion in the downloads folder.

3/ Add try accept to handle all possible errors gracefully.

4/ Add a method named "mix" which is basically all three model respectfully multiprocessing, threading, and asyncio and all of them are submitted in ThreadPoolExecutor to run concurrently. This method is a fun mix and worth benchmarking along with other ones for better behavior recognition among different benchmarks.

5/ Add log recorded how many successful/fail file after downloading in terminal for asyncio and threading. Unfortunately its not applied for multiprocessing since its not process safe.

6/ Added renaming logic in downloaded files to prevent overwriting existing files which leads to incorrect results, so basically i added in a while loop which will check for existing file name, if its already existed that file name then the counter goes up until not then the counter number will be added to the name to make it unique to avoid overwriting if its the same name.

7/ Update the log in log.txt to shoutout the name of the method that called for the log for better clarification through passing parameter of method name to the logger.py

8/ Added Lock() for both threading and asyncio to avoid race condition when updating successful/fail count of downloads as well as calling log action for log.txt

In this downloader I coded all threading, asyncio, multiprocessing, and mix models then benchmark them based on total time run and its successful downloads then rank them based on successful file downloaded per second and analyze them.

Btw there are logs for threading and asyncio downloads, but unfortunately there is no log for multiprocessing approach because its not process safe.

You can refer to my old version of CLI concurrent downloader here:

https://github.com/WillyPhan06/Concurrency-Project-CLI-Concurrent-Downloader

Here is link to my Software Architect and DevSecOps Engineer road map:

https://github.com/WillyPhan06/Software-Architect-and-DevSecOps-Engineer-Road-Map

Here is high-level architecture diagram for both of my Project 1 and Project 2 of CLI Concurrent Downloader:

High Level Design of CLI Concurrent Downloader

📘 HOW TO USE - Concurrent Downloader CLI Tool This project allows you to download multiple files concurrently using threading, asyncio, or multiprocessing in Python.

✅ Requirements

Python 3.10+

Visual Studio Code (VS Code) is recommended

🧠 Step-by-Step Instructions

🔧 Step 0: Clone the Repository

Open VS Code Press Ctrl + ~ to open the terminal

Navigate to a folder where you want to store the project: cd path/to/your/folder

Clone the repo: git clone https://github.com/WillyPhan06/Concurrency-Project-CLI-Concurrent-Downloader

Navigate into the project folder: cd Concurrency-Project-CLI-Concurrent-Downloader

🐍 Step 1: Set Up Virtual Environment

Run the following command to create a virtual environment: python -m venv venv

Activate the virtual environment:

On Windows (CMD): venv\Scripts\activate

On Windows PowerShell (recommended if you are using terminal of VSCode in Window): .\venv\Scripts\Activate.ps1

On Mac/Linux: source venv/bin/activate

Install dependencies (if any are added in future): pip install -r requirements.txt

🌐 Step 2: Edit urls.txt

Open urls.txt in VS Code. Add one URL per line that you want to download content from.

Example: https://example.com/file1.jpg https://example.com/file2.jpg

📥 Step 3: Run the Downloader Make sure you're in the main project folder (Concurrency-Project-CLI-Concurrent-Downloader).

Choose one of the following methods:

▶️ Download using Threading: python concurrent_downloader/main.py --method threading --url-file urls.txt --output-dir downloads

▶️ Download using Asyncio: python concurrent_downloader/main.py --method asyncio --url-file urls.txt --output-dir downloads

▶️ Download using Multiprocessing: python concurrent_downloader/main.py --method multiprocessing --url-file urls.txt --output-dir downloads

⏳ Step 4: Wait for Completion

The terminal will show a "Done" message along with the total time taken.

Progress and status will be printed while downloading.

📁 Step 5: Check Your Downloads

All downloaded files will be saved inside the downloads/ folder.

If you used threading or asyncio, a log.txt file will also be generated inside downloads/, containing detailed logs of the download process.

💡 Tips

Make sure the URLs in urls.txt are valid and accessible.

To restart, just clear downloads/ and re-run a method.

About

This is the 2nd project in final projects of concurrency. For project 1 I already made a CLI (Command-Line Interface) concurrent downloader to download multiple websites concurrently using command lines in the terminal. In this 2nd project, i reconstructed the project 1 from scratch on my own with improvements. Read more in benchmark_documents.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages