Educative.io Scraper -- Educative.io Downloader

Description: 
This tool effortlessly scrapes and saves Educative.io courses for offline use enabling you to
learn at your own pace, even without an internet connection.

Contributions:
I wholeheartedly welcome contributions from individuals in any capacity to enhance this project.
Thank you for your support!

Disclaimer:
I want to clarify that I am not accountable for any inappropriate use of this scraper. 
I developed it solely for research purposes and take no responsibility for its misuse.

Repository Version: v3.9.6 (Recommended)
Master Branch: v3-master

1. Updates Information
-  Add terminal mode feature - Run `python EducativeScraper.py --help` for details (v3.9.5)
-  Single file implementation changed - uses extension to get page data. (v3.9.1)
-  Cloudlabs, Projects can now be scraped using Auto/Manual Scraper (v3.9.0) 
-  SandPack code container support added
-  SingleFile injection script now uses single-file-cli logic instead of custom (v3.7.4) 
-  Injectes SingleFile through Local. Script faster by 10 seconds per topic, fixes Iframe (v3.7.3)
-  AutoFixUrl and AutoResume were added (v3.6.5+)
      * AutoFixUrl: The urls text file will be automatically updated based on the last topic URL from the Log File.
        If a manual edit was done on the text file then consider unchecking this option.
      * AutoResume (Recommended): Retries 3/4 times if there is any error occurs for a specific URL.
        If it still fails, then consider checking the log for more details. 
-  Undetected driver replaced by SeleniumBase (Bypass Cloudflare Turnstile) (v3.5.7+)
-  Run with --install arg again OR manually clean install (v3.5.7+)
-  Delete the old UserDataDir (v3.5.5+)
-  No existing Chrome browser should be running in the background (v3.4.2+)     
-  Redownload Chrome Binary and Chrome driver. (v3.4.2+)
-  If Undetected/SeleniumBase does not work then UNCHECK and use default webdriver. (v3.4.2+)
2. Send a mail notification status, Setup here: /src/Main/MailNotify.py

To view the downloaded courses, you can use the Educative-Viewer repository, which provides a better readability and user-friendly interface for accessing the downloaded course content.

Steps to use the scraper:

Prerequisites:

Git
Python 3.12 or more
OS: Win(x86/x64) - Mac(ARM64/x64) - Linux(ARM64/x64)

Download & cd this project dir.

git clone https://github.com/anilabhadatta/educative.io_scraper.git
cd educative.io_scraper

Run the following commands to start Educative Scraper.

Automatic Steps:

Use python3 instead of python for Linux and MacOS.

python setup.py --install
python setup.py --run

[Commands]
--install: Creates a virtual environment and installs the required dependencies.
--run: Activates the environment and starts the scraper. [Default = True]
--create: Creates a shortcut executable file linked to the scraper directory.
      
      If the git repository is moved to a different location after creating
      the executable then recreate it again to set the new repository path.

Manual Steps:

Windows:

pip install virtualenv
python -m venv env <or> virtualenv env
env\Scripts\activate
pip install -r requirements.txt

python EducativeScraper.py                 (For UI)
python EducativeScraper.py --loginbrowser  (Open browser for login to account)
python EducativeScraper.py --terminal      (For Terminal)
python EducativeScraper.py --help          (For Config and Help info)

MacOS/Linux:

pip3 install virtualenv
python3 -m venv env <or> virtualenv env
source env/bin/activate
pip3 install -r requirements.txt

python3 EducativeScraper.py                 (For UI)
python3 EducativeScraper.py --loginbrowser  (Open browser for login to account)
python3 EducativeScraper.py --terminal      (For Terminal)
python3 EducativeScraper.py --help          (For Config and Help info)

Run the help command to learn about config setup for terminal based scraping before starting the scraper

Recommeded GUI Settings

After the GUI successfully loads, please proceed to follow the subsequent steps.
- Create a text file.
- Copy the URLs of the first topic/lesson from any number of courses.
- Paste all the URLs into the text file and save it.
  
  Reference
- Select a configuration if you prefer not to use the default configuration.
- If you prefer not to display the browser window, choose the headless option.
- Please provide a unique User Data Directory name that the browser will use to store your current session. Ensure that each instance of the scraper has a distinct User Data Directory name.
- Please select the file path of the text file containing the course URLs, as well as the directory where you would like to save the downloaded content.
- You can choose to save/export the current configuration for later use, or you can opt for the default configuration.
- For the initial setup or updates, click on Download Chromedriver and Download Chrome Binary to automatically Download them into the project directory.
- If you intend to utilize proxies, simply enable the proxy option and enter the proxy in proxies box.
  - For IP authorized proxy, you can directly enter IP:PORT of the proxy.
  - For USER:PASS authorized proxy, you'll need to create a localhost tunnel using the Proxy-Login-Automator repository.
  - After setting up the tunnel, enter the IP:PORT of the localhost proxy that you configured in the Proxy Login Automator.
- Click on Login Account to log in to your Educative.io account and click on Close Browser Button to close the browser after the login is completed.
- Click on Start Scraper to begin scraping the courses.
- The scraper will automatically stop after scraping all the URLs in the selected text file.
- If you decide to stop the scraper using the Stop Scraper Button before it finishes or face any errors, the most recent URL will be saved in the EducativeScraper.log file. Simply copy the URL from the INFO logger and replace the URL of the topic/lesson that has already been completed with the copied URL. This will allow you to resume the scraper from where you left off.
- An index is NOT required in the URL's text file, Simply paste the URLs of the topic from which you want to start/resume scraping.
- Added new function to auto scraper
  - Can automatically scrape Cloudlabs and Projects links added in text file.
  - Select ModuleType [CLOUDLABS/PROJECTS/COURSE-PATH]
- Added Manual Scraper Button (Used to scrap a specific topic opening in the browser)
  - Important: Disable Seleniumbase checkbox for this.
  - Open the the browser using Login/Open Browser, Search for a topic in the opened tab.
  - AutoNext checkbox will work only with Manual Scraper. This will automatically scrape consecutive topics and will finish at the end of the topic of that specific course.
  - To stop, click on close browser.
  - Cloudlabs/Projects can be scraped using this.
  - Change configuration as per topic type in UI field: ModuleType

Name		Name	Last commit message	Last commit date
Latest commit History 572 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
src		src
.gitignore		.gitignore
EducativeScraper.py		EducativeScraper.py
HTML2PDF_CONVERTER_README.md		HTML2PDF_CONVERTER_README.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Educative.io Scraper -- Educative.io Downloader

To view the downloaded courses, you can use the Educative-Viewer repository, which provides a better readability and user-friendly interface for accessing the downloaded course content.

Steps to use the scraper:

Prerequisites:

Download & cd this project dir.

Run the following commands to start Educative Scraper.

Automatic Steps:

Use python3 instead of python for Linux and MacOS.

Manual Steps:

Windows:

MacOS/Linux:

Run the help command to learn about config setup for terminal based scraping before starting the scraper

After the GUI successfully loads, please proceed to follow the subsequent steps.

About

Uh oh!

Releases 25

Packages

Uh oh!

Contributors 6

Uh oh!

Languages

License

anilabhadatta/educative.io_scraper

Folders and files

Latest commit

History

Repository files navigation

Educative.io Scraper -- Educative.io Downloader

To view the downloaded courses, you can use the Educative-Viewer repository, which provides a better readability and user-friendly interface for accessing the downloaded course content.

Steps to use the scraper:

Prerequisites:

Download & cd this project dir.

Run the following commands to start Educative Scraper.

Automatic Steps:

Use python3 instead of python for Linux and MacOS.

Manual Steps:

Windows:

MacOS/Linux:

Run the help command to learn about config setup for terminal based scraping before starting the scraper

After the GUI successfully loads, please proceed to follow the subsequent steps.

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 25

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages