Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
243 commits
Select commit Hold shift + click to select a range
620549c
new pull request because I suck
hwamil May 14, 2021
7636050
moved assigning next item to the bottom of while loop since AmazonMon…
hwamil May 14, 2021
a0263dd
improved BadProxyCollector
hwamil May 15, 2021
88d7aa5
made bad_proxies.json parameters more clear
hwamil May 15, 2021
7a287b2
proxy collector now deletes an unbanned proxy if it has been good for…
hwamil May 15, 2021
e5326e5
Fixed a huge mistake in item assignment thanks to mbbush
hwamil May 15, 2021
acd430e
changed ItemsHandler assign_next_item method name to pop, although it…
hwamil May 15, 2021
7e27c82
more methods for collector. gotta figure out how to save to file ONCE…
hwamil May 15, 2021
a68b3bf
added 'issaver' kwarg to AmazonMonitor. After all sessions are create…
hwamil May 15, 2021
6d24ffc
BadProxyCollector is now never instantiated
hwamil May 15, 2021
1e12313
BadProxyCollector delay for purging cooled down proxies from json cha…
hwamil May 15, 2021
a8060d7
new proxies.template_json
hwamil May 15, 2021
6eda06e
fixed error in proxies template
hwamil May 15, 2021
75da6cc
added return line in except block at BPC load so that it doesn't loop…
hwamil May 15, 2021
cdec75b
previous commit was a mistake. it is supposed to loop lmao
hwamil May 15, 2021
c25a4f3
if no proxies and sleeping sessions for a minute if 503 response
hwamil May 16, 2021
25cbc73
implemented no proxy mode. uses single session using TCPConnector and…
hwamil May 16, 2021
61569a9
changed 503 time.sleep to await asyncio.sleep
hwamil May 16, 2021
92eed16
503 log change for no proxy mode
hwamil May 16, 2021
3c2d936
Saving user agents to reuse when generating headers for new instances…
hwamil May 16, 2021
ee0f14e
feature to save user_agents for corresponding proxy urls so that sess…
hwamil May 16, 2021
a47a0d0
merged headers branch, adding user_agents save feature
hwamil May 16, 2021
31148af
badproxycollector timer bug fix
hwamil May 16, 2021
a511e02
BPC fixed attribute error
hwamil May 16, 2021
b9f68b4
experimental feature stagger
hwamil May 17, 2021
265248c
add init_sleep parameter using idx in monitor initialization and use …
hwamil May 17, 2021
3183587
fixed unexpected kwarg
hwamil May 17, 2021
ce40261
stagger using len of item_list instead of n of proxies
hwamil May 17, 2021
15beaea
realized that cycling len of item list defeats the purpose. using ran…
hwamil May 17, 2021
92ef86e
random.choice may generate too many repeating numbers. going back to …
hwamil May 17, 2021
1e86ef8
threw all previous methods away and just used proxies // len(item_lis…
hwamil May 17, 2021
f1ef100
added init_sleep attr to fail_recreate method
hwamil May 17, 2021
253a6ec
abandoned previous method and placed staggering algo right into the I…
hwamil May 17, 2021
ba043d0
trying another method of staggering
hwamil May 17, 2021
cdf2b28
fixed unpack error caused by remnants of previous method
hwamil May 17, 2021
7269253
fixed mistake of updating access time when returning true on itemshan…
hwamil May 17, 2021
0ed906c
changed method name
hwamil May 17, 2021
1368e33
changed next item flow by accident
hwamil May 17, 2021
183ac89
fixed BPC bug
hwamil May 17, 2021
3f458ca
fixed BPC bug 2
hwamil May 17, 2021
40561fc
increased bad proxy wait from 1 min to 5 min
hwamil May 17, 2021
04c1f35
Removed issaver flag from Monitoring and moved it into queue. Also re…
hwamil May 17, 2021
13ba193
simplified BPC so that it only shows currently bad proxies with last …
hwamil May 18, 2021
75e9345
blackd
hwamil May 18, 2021
52b8817
Realized BPC being called could break program when user has no proxies
hwamil May 18, 2021
2012bb8
cleaned up unused parameter checkout_task from AZNHandler and AZNMHan…
hwamil May 18, 2021
d91ec1d
simplified BPC further to just be a list so that it's more readable a…
hwamil May 18, 2021
e5a5901
increased sleep time from 0.1 to 0.5 for last_access_check for item s…
hwamil May 18, 2021
f055c30
pipenv install uvloop for fasteer asyncio
hwamil May 18, 2021
f3fbb4a
changed staggering back to 1s because... why not?
hwamil May 18, 2021
0044216
cleaning bloat
hwamil May 18, 2021
6b826a5
small things
hwamil May 18, 2021
dfb7b9f
moved IH item_ids declarance into create_items_pool block.
hwamil May 18, 2021
9746abd
Update misc.py
hwamil May 19, 2021
1648ec3
WIP
hwamil May 20, 2021
bae1305
uvloop now a click option as Windows is not supported
hwamil May 20, 2021
b31f1a0
migrated tools from misc to monitoring
hwamil May 20, 2021
340f17e
fixed merge conflict
hwamil May 20, 2021
9d695cb
tracking misc.py
hwamil May 20, 2021
80f0a4f
fixed missing bad_proxies path var and changed item delay to be impli…
hwamil May 20, 2021
ad5395a
apparently you can't import asyncio.sleep
hwamil May 20, 2021
06f62a1
changed it so that task_delay changes with the change in the number o…
hwamil May 20, 2021
765018f
small fix for bpc
hwamil May 20, 2021
0ba6320
renamed bpcs
hwamil May 20, 2021
7377d5f
log for task_delay
hwamil May 20, 2021
a10bdc8
fixed error in f string
hwamil May 20, 2021
5cbb9de
bit more info in bpc log
hwamil May 20, 2021
b4c5c08
changed method name for json_url
hwamil May 20, 2021
323fc50
merge conflict resolved
hwamil May 20, 2021
2436bd8
delay / (good_proxies/asins) if len(good_proxies) > len(asins)
hwamil May 20, 2021
d092bb5
trying different flow
hwamil May 20, 2021
c7e4f31
time.sleep severely impacts performance. Shouldn't be a surprise :/
hwamil May 20, 2021
fb4fd61
took out first item condition
hwamil May 20, 2021
c707989
small changes to check_last_access
hwamil May 20, 2021
db8ed35
commit before merging rotation
hwamil May 20, 2021
e5a67b6
conflict resolved
hwamil May 20, 2021
b40eb98
use same headers when recreating. fake_user_agent causing too many op…
hwamil May 20, 2021
d102ad5
reduced proxies.json layer
hwamil May 20, 2021
94c1cb8
proxies.template_json
hwamil May 20, 2021
5c2171f
queue.put to put_nowait so that it doesn't get blocked
hwamil May 20, 2021
96aee13
had to await last_access_check for it to work properly
hwamil May 20, 2021
01f19e6
changed item stagger to a task stagger implementation where it checks…
hwamil May 20, 2021
d8a4442
changed rest_time format from rounding to two decimal places to no ro…
hwamil May 20, 2021
ebadea7
changed formatting one more time to milliseconds with rounding
hwamil May 20, 2021
9eb3dca
set last_task before sleeping
hwamil May 20, 2021
d9c1569
working build
hwamil May 20, 2021
f7f49bd
tested and working with multiple groups of proxies
hwamil May 20, 2021
ac7c311
time.sleep to async.sleep. realized even if it's a very small amount …
hwamil May 20, 2021
3df8685
got rid of bpc.save since you can find it in log and it slows down mo…
hwamil May 20, 2021
8a9fedb
reducing bpc even further. may just get rid of it altogether. seems u…
hwamil May 20, 2021
fda822d
preload proxy group to get rid of ramp up
hwamil May 20, 2021
485af97
Revert "preload proxy group to get rid of ramp up"
hwamil May 20, 2021
891ed0b
failed to preload proxies and reverted. little hacky but it'll do for…
hwamil May 20, 2021
1bfbcf4
switched to time.sleep again. see what happens
hwamil May 20, 2021
494c812
just going back to asyncio.sleep cuz I don't know any better
hwamil May 20, 2021
9319305
threw out dynamic staggering and just implemented initial staggering …
hwamil May 20, 2021
a3485d2
change non-active groups sleep to delay so it doesn't hang the progra…
hwamil May 20, 2021
16cf946
got rid of unnecessary redundancies
hwamil May 20, 2021
021a0b3
.
hwamil May 20, 2021
5a6d7dd
..
hwamil May 20, 2021
7f0c314
fixed broken timer
hwamil May 20, 2021
bf1704b
I'm too tired to know what I did
hwamil May 20, 2021
2898e71
Merge branch 'rotation_v2.1' into atc_json
hwamil May 21, 2021
a6bf8fb
merged rotation
hwamil May 21, 2021
4c2d7a9
moved uvloop to dev packages so normal users on windows don't have to…
hwamil May 21, 2021
0fe6968
almost implemented
hwamil May 21, 2021
f1df53f
clunky but working
hwamil May 21, 2021
f7e19b7
Merge branch 'atc_json' into rotation_v2.1
hwamil May 21, 2021
16d8177
tiny touch up
hwamil May 21, 2021
ef611b5
timers in the blocks so we're not hitting super fast
hwamil May 21, 2021
779438d
.
hwamil May 21, 2021
537733a
conflict resolved
hwamil May 21, 2021
1004703
checks and balances
hwamil May 21, 2021
709769a
neatify logs
hwamil May 21, 2021
b4c7b17
placed another check so that if json request gets 503'd it moves onto…
hwamil May 21, 2021
bed40d6
blackd
hwamil May 21, 2021
98c8b51
omg this is clunky as hell
hwamil May 21, 2021
d1e33c3
TypeError bandaid in validate_session
hwamil May 21, 2021
d9e93b9
big bandaid for nonetype error
hwamil May 21, 2021
9db43be
tree is not None
hwamil May 21, 2021
0e38e37
proxy next to dict to see confirm change
hwamil May 21, 2021
0957b32
lol coffeebeans freaked me out
hwamil May 21, 2021
1b5fb14
changed test product to a card so sessions start empty lmao
hwamil May 21, 2021
fe22a2e
stopping validation every time the loop starts
hwamil May 21, 2021
dc91dd8
queue.put before save_html, duh
hwamil May 21, 2021
78abf2a
Update amazon_monitoring.py
hwamil May 21, 2021
e470643
clearing json_dict before get to confirm we are getting valid json pa…
hwamil May 22, 2021
16c9a4a
fixed turbo_ini params. confirms it initiates correctly. received emp…
hwamil May 22, 2021
6696154
more logssss cuz we don't have enoughhhh
hwamil May 22, 2021
f9c9eee
remember me checkbox fix taken from calebchongc
hwamil May 22, 2021
2c8002e
fixed confusing debug log that made it appear that it runs ajax on ev…
hwamil May 22, 2021
ef4e7e6
just making logs look pretty.
hwamil May 22, 2021
fbc3809
turbo_checkout missing domain param added
hwamil May 22, 2021
e1e143c
domain -> self.amazon_domain
hwamil May 22, 2021
4dbbeeb
bring me more logs!
hwamil May 22, 2021
7090f5d
isOK check added
hwamil May 22, 2021
52585c0
isOK check added2
hwamil May 22, 2021
fda3247
unsplitted turbo init branching for json and ajax methods
hwamil May 22, 2021
7f776ac
little clean up
hwamil May 22, 2021
5134846
using context manager (async with session.get(url) as r) to validate …
hwamil May 22, 2021
40dd1ad
more cleaning
hwamil May 22, 2021
5f4897f
get revalidated if CSRF Error
hwamil May 22, 2021
30e0543
continue after failing validation
hwamil May 22, 2021
ff565b4
ItemsHandler now adds back items that's been removed after turboing i…
hwamil May 22, 2021
1dbc92e
offerid.template_json
hwamil May 22, 2021
4c33718
ItemHandler timer increased from 10 min to 60 min
hwamil May 22, 2021
538faea
asyncio InvalidStateError bandaid exception catching and timer reset …
hwamil May 22, 2021
efc91ba
clear out removed items list on ItemsHandler.refresh
hwamil May 22, 2021
a6521ef
I'm stupid
hwamil May 22, 2021
a547e79
one less line for the same result. bliss
hwamil May 22, 2021
bb0c46a
Figured out that ValueError: not in list was happening because multip…
hwamil May 23, 2021
f59e886
Merge branch 'alpha' of github.com:Hari-Nagarajan/fairgame into rotat…
hwamil May 23, 2021
aa0beda
cleaned off bpc
hwamil May 23, 2021
a795d85
ValueError exception handling modified
hwamil May 23, 2021
d8d206b
moved queue.put() and save_html into try block instead of outside of …
hwamil May 23, 2021
fd66e60
forgot the f in f-string
hwamil May 23, 2021
933d240
log for checking whether turbo init is getting SellerDetail or offerid
hwamil May 23, 2021
1d121a4
trying to catch StopIteration
hwamil May 23, 2021
11aa281
exception StopIteration catching at next_item method
hwamil May 23, 2021
1000e05
stop logging ValueError
hwamil May 23, 2021
2693d7f
random delay
hwamil May 23, 2021
ce94705
asyncio.TimeoutError exception catching
hwamil May 23, 2021
d747451
log for timeouterror
hwamil May 23, 2021
c18d5ad
lmao infinite delay by mistake. fixed
hwamil May 23, 2021
30b9bee
some sleeping cushions in validation process
hwamil May 23, 2021
0ec1fed
changed randint range to 0,4
hwamil May 23, 2021
58dc858
all exception catching try/except block at cli.py just as a hotfix
hwamil May 23, 2021
89045ab
some changes that I can't remember
hwamil May 24, 2021
b392ea4
fail_recreate seems to cause the crashes (for unknown reasons to me s…
hwamil May 24, 2021
7d858f2
I can't count
hwamil May 24, 2021
fa0298e
getting rid of staggering since now there is random delay
hwamil May 24, 2021
9db4e1c
Reset fail_counter after cooldown
hwamil May 24, 2021
d3a4275
Now using fake_headers module to generate random headers for sessions
hwamil May 24, 2021
f71bd22
dependencies for fake_headers
hwamil May 24, 2021
bcf1149
max fail from 10 to 5
hwamil May 24, 2021
8781e49
current_group_proxies dust cleaned
hwamil May 24, 2021
84c1d34
changed loglevel for ajax so we can see non-offerid items in aioconfig
hwamil May 24, 2021
ca09532
you know i love logs
hwamil May 24, 2021
cd0861f
Trying to catch the reason for 'can't extract image from plain/text' …
hwamil May 24, 2021
fe13f56
amazoncaptcha.exceptions.ContentTypeError was the culprit. Why now? W…
hwamil May 24, 2021
1895e6e
catching typeerror
hwamil May 25, 2021
df3c81d
Linux specific Errno 32, for python it becomes IOError. Putting entir…
hwamil May 25, 2021
039099b
merged hari/alpha with TheTabKey's commit
hwamil May 25, 2021
f5bf60e
proccesspoolexecutor added instead of trying to priorityqueue
hwamil May 26, 2021
ece3a80
merged hari/alpha: --use-proxies flag
hwamil May 26, 2021
ee5ea35
more resolving with main alpha branch and adding offerid flag
hwamil May 26, 2021
8627fec
turned on debug logs
hwamil May 26, 2021
dbb72cb
merged processpoolexecutor method
hwamil May 26, 2021
0665916
blackd
hwamil May 26, 2021
5069651
trying multiprocessing on captcha solving as well
hwamil May 26, 2021
04b943e
increased proxy cooldown time from 1 hr to 6 hrs to lessen chance of …
hwamil May 26, 2021
e09878e
took captcha solving off of multiprocessing.
hwamil May 26, 2021
d6ca7ed
added back init stagger to potentially alleviate IO congestion... pre…
hwamil May 26, 2021
8812eca
changed so that asyncio.gather(monitors) executes first without being…
hwamil May 26, 2021
487b1d3
went back to the old way of submitting to process pool as it made mor…
hwamil May 26, 2021
4f4e527
misc
hwamil May 26, 2021
b477cae
misc2
hwamil May 26, 2021
1718264
misc3
hwamil May 26, 2021
f007999
trying out asyncio's own run_in_executor for parallel computing. capt…
hwamil May 27, 2021
45a8b85
deleted concurrent.futures import line. one thing to note is that run…
hwamil May 27, 2021
56408ef
limit captcha solving workers to 2 so that other two can run checkout…
hwamil May 27, 2021
b53241c
just putting run_in_executor on everything I can since I'm getting 40…
hwamil May 30, 2021
ed4e1ea
take get_qualified_seller off executor
hwamil Jun 4, 2021
1cf50e0
made small change to source code for amazoncaptcha so copied the libr…
hwamil Jun 4, 2021
f7796a4
placed sleep in while loop in validate_session
hwamil Jun 4, 2021
3235aa9
using fake-headers since mobile headers ain't doing shit
hwamil Jun 4, 2021
080bc92
experimenting with captcha solves and whether it can get past bot det…
hwamil Jun 4, 2021
f2030b9
instead of sleeping an hour and resuming now 10 fails will make a pro…
hwamil Jun 4, 2021
9996c8b
updating log statements
hwamil Jun 4, 2021
ec03aee
dem logs
hwamil Jun 4, 2021
45f0a7e
checks status along with tree so we don't unnecessarily call get_sellers
hwamil Jun 4, 2021
8682896
dem logss
hwamil Jun 4, 2021
58f852b
less noise in the logs
hwamil Jun 4, 2021
363bd6e
randomize sleep times so program not bombarded all at once when coold…
hwamil Jun 5, 2021
e3d6019
50 tries seem okay but with a lot of proxies getting captchas it real…
hwamil Jun 5, 2021
ce0dbf2
add to bad proxies list when failing validation
hwamil Jun 5, 2021
3caa800
changed fail sleep time between 5-10 minutes instead of 30-60 minutes
hwamil Jun 5, 2021
7786572
added captchaaio
hwamil Jun 5, 2021
04d342e
undoing multiprocessing on checkout_worker since some users complaine…
hwamil Jun 6, 2021
a0a39ba
await gather
hwamil Jun 6, 2021
c378ade
modify logs
hwamil Jun 6, 2021
226c0cf
got rid of cooldown for 503 since json method gets 200 while ajax get…
hwamil Jun 6, 2021
05b6943
remove proxy from badproxies list if it becomes validated
hwamil Jun 6, 2021
b14e86f
clean up
hwamil Jun 6, 2021
151d7c1
trying another way to run checkout_worker in parallel
hwamil Jun 6, 2021
b111eef
apparently if you don't set process executor for run_in_executor it d…
hwamil Jun 6, 2021
a304f68
captcha max_workers to half cpu_count
hwamil Jun 6, 2021
96d442d
captcha max try back to 25
hwamil Jun 6, 2021
956006e
infinite (1000) captcha tries
hwamil Jun 7, 2021
0900119
changes I can't remember.
hwamil Jun 8, 2021
d2a7566
Merge branch 'alpha' of github.com:Hari-Nagarajan/fairgame into rotat…
hwamil Jun 8, 2021
574ef28
merged alpha
hwamil Jun 8, 2021
ea3975f
no hard-coded domain
hwamil Jun 8, 2021
0a17616
Merge branch 'alpha' of github.com:Hari-Nagarajan/fairgame into rotat…
hwamil Jun 8, 2021
12e1419
fix lack of scheme -Dakk
hwamil Jun 8, 2021
6ebd2d5
merged alpha
hwamil Jun 8, 2021
526d462
I'm getting way too many captchas. gotta filter out the good ones
hwamil Jun 8, 2021
8b5f765
tiny fix for good_proxies.json len
hwamil Jun 8, 2021
1c0343e
more proxies list changes
hwamil Jun 8, 2021
0a7143b
.
hwamil Jun 8, 2021
ead4438
pass monitoring session into captcha solver so that it doesn't use ho…
hwamil Jun 8, 2021
26d6127
log and html save changes
hwamil Jun 11, 2021
4434e3b
.
hwamil Jun 28, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,4 @@ logs/*.log*
tags
stores/store_data/item_cache.p
Amazon_aio.bat
myutils
5 changes: 5 additions & 0 deletions Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ verify_ssl = true

[dev-packages]
pyinstaller = "*"
uvloop = "*"

[packages]
requests = {extras = ["socks"], version = "*"}
Expand Down Expand Up @@ -39,6 +40,10 @@ dnspython = "*"
fake-useragent = "*"
pysocks = "*"
aiohttp-proxy = "*"
uvloop = "*"
fake-headers = "*"
bs4 = "*"
html5lib = "*"

[requires]
python_version = "3.8"
46 changes: 46 additions & 0 deletions amazoncaptcha_aio/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# -*- coding: utf-8 -*-

"""Solver for Amazon's image captcha.

The motivation behind the creation of this library is taking its start from
the genuinely simple idea: "I don't want to use pytesseract or some other
non-amazon-specific OCR services, nor do I want to install some executables to
just solve a captcha. I desire to get a solution within 1-2 lines of code
without any heavy add-ons. Using a pure Python."

Examples:
Browsing Amazon using selenium and stuck on captcha? The class method
below will do all the "dirty" work of extracting an image from the webpage
for you. Practically, it takes a screenshot from your webdriver, crops the
captcha, and stores it into bytes array, which is then used to create an
AmazonCaptcha instance. This also means avoiding any local savings.

from amazoncaptcha import AmazonCaptcha
from selenium import webdriver

driver = webdriver.Chrome() # This is a simplified example
driver.get('https://www.amazon.com/errors/validateCaptcha')

captcha = AmazonCaptcha.fromdriver(driver)
solution = captcha.solve()


If you are not using selenium or the previous method is not just the case for
you, it is possible to use a captcha link directly. This class method will
request the url, check the content type and store the response content into bytes
array to create an instance of AmazonCaptcha.

from amazoncaptcha import AmazonCaptcha

link = 'https://images-na.ssl-images-amazon.com/captcha/usvmgloq/Captcha_kwrrnqwkph.jpg'

captcha = AmazonCaptcha.fromlink(link)
solution = captcha.solve()

"""

from .solver import AmazonCaptcha
from .devtools import AmazonCaptchaCollector
from .exceptions import ContentTypeError, NotFolderError

#--------------------------------------------------------------------------------------------------------------
8 changes: 8 additions & 0 deletions amazoncaptcha_aio/__version__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
__title__ = 'amazoncaptcha'
__description__ = "Pure Python, lightweight, Pillow-based solver for Amazon's text captcha."
__url__ = 'https://github.com/a-maliarov/amazoncaptcha'
__version__ = '0.5.0'
__author__ = 'Anatolii Maliarov'
__author_email__ = 'tly.mov@gmail.com'
__license__ = 'MIT'
__copyright__ = 'Copyright 2020 Anatolii Maliarov'
150 changes: 150 additions & 0 deletions amazoncaptcha_aio/devtools.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
# -*- coding: utf-8 -*-

"""
amazoncaptcha.devtools
~~~~~~~~~~~~~~~~~~~~~~

This module contains the set of amazoncaptcha's devtools.
"""

from .solver import AmazonCaptcha
from .exceptions import NotFolderError
from .__version__ import __version__

from io import BytesIO
import multiprocessing
import requests
import os

#--------------------------------------------------------------------------------------------------------------

class AmazonCaptchaCollector(object):

def __init__(self, output_folder_path, keep_logs=True, accuracy_test=False):
"""
Initializes the AmazonCaptchaCollector instance.

Args:
output_folder (str): Folder where images or logs should be stored.
keep_logs (bool, optional): Is set to True, unsolved captcha links
will be stored separately.
accuracy_test (bool, optional): If set to True, AmazonCaptchaCollector
will not download images, but just solve them and log the results.

"""

self.output_folder = output_folder_path
self.keep_logs = keep_logs
self.accuracy_test = accuracy_test

if not os.path.exists(self.output_folder):
os.mkdir(self.output_folder)

elif not os.path.isdir(self.output_folder):
raise NotFolderError(self.output_folder)

self.collector_logs = os.path.join(self.output_folder, f'collector-logs-{__version__.replace(".", "")}.log')
self.test_results = os.path.join(self.output_folder, 'test-results.log')
self.not_solved_logs = os.path.join(self.output_folder, 'not-solved-captcha.log')

def _extract_captcha_link(self, captcha_page):
"""Extracts a captcha link from an html page.

Args:
captcha_page (str): A page's html in string format.

Returns:
str: Captcha link.

"""

return captcha_page.text.split('<img src="')[1].split('">')[0]

def _extract_captcha_id(self, captcha_link):
"""
Extracts a captcha id from a captcha link.

Args:
captcha_link (str): A link to the captcha image.

Returns:
str: Captcha ID.

"""

return ''.join(captcha_link.split('/captcha/')[1].replace('.jpg', '').split('/Captcha_'))

def get_captcha_image(self):
"""
Requests the page with Amazon's captcha, gets random captcha.
Creates AmazonCaptcha instance, stores an original image before solving.

If it is not an accuracy test, the image will be stored in a specified
folder with the solution within its name. Otherwise, only the logs
will be stored, mentioning the captcha link being processes and the result.

"""

captcha_page = requests.get('https://www.amazon.com/errors/validateCaptcha')
captcha_link = self._extract_captcha_link(captcha_page)

response = requests.get(captcha_link)
captcha = AmazonCaptcha(BytesIO(response.content))
captcha._image_link = captcha_link
original_image = captcha.img

solution = captcha.solve(keep_logs=self.keep_logs, logs_path=self.not_solved_logs)
log_message = f'{captcha.image_link}::{solution}'

if solution != 'Not solved' and not self.accuracy_test:
print(log_message)
captcha_name = 'dl_' + self._extract_captcha_id(captcha.image_link) + '_' + solution + '.png'
original_image.save(os.path.join(self.output_folder, captcha_name))

else:
print(log_message)
with open(self.collector_logs, 'a', encoding='utf-8') as f:
f.write(log_message + '\n')

def _distribute_collecting(self, milestone):
"""Distribution function for multiprocessing."""

for step in milestone:
self.get_captcha_image()

def start(self, target, processes):
"""
Starts the process of collecting captchas of conducting a test.

Args:
target (int): Number of captchas to be processed.
processes (int): Number of simultaneous processes.

"""

goal = list(range(target))
milestones = [goal[x: x + target // processes] for x in range(0, len(goal), target // processes)]

jobs = list()
for j in range(processes):
p = multiprocessing.Process(target=self._distribute_collecting, args=(milestones[j], ))
jobs.append(p)
p.start()

for proc in jobs:
proc.join()

if self.accuracy_test:
with open(self.collector_logs, 'r', encoding='utf-8') as f:
output = f.readlines()

all_captchas = len(output)
solved_captchas = len([i for i in output if 'Not solved' not in i])
success_percentage = round((solved_captchas / all_captchas) * 100, 5)
result = f'::Test::Ver{__version__}::Cap{all_captchas}::Per{success_percentage}::'

with open(self.test_results, 'w', encoding='utf-8') as f:
print(result)
f.write(result)

#--------------------------------------------------------------------------------------------------------------
38 changes: 38 additions & 0 deletions amazoncaptcha_aio/exceptions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# -*- coding: utf-8 -*-

"""
amazoncaptcha.exceptions
~~~~~~~~~~~~~~~~~~~~~~~~

This module contains the set of amazoncaptcha's exceptions.
"""

#--------------------------------------------------------------------------------------------------------------

class ContentTypeError(Exception):
"""
Requested url, which was supposed to be the url to the captcha image
contains unsupported content type within response headers.
"""

def __init__(self, content_type, message='is not supported as a Content-Type. Cannot extract the image.'):
self.content_type = content_type
self.message = message

def __str__(self):
return f'"{self.content_type}" {self.message}'

class NotFolderError(Exception):
"""
Given path, which was supposed to be a path to the folder, where
script can store images, is not a folder.
"""

def __init__(self, path, message='is not a folder. Cannot store images there.'):
self.path = path
self.message = message

def __str__(self):
return f'"{self.path}" {self.message}'

#--------------------------------------------------------------------------------------------------------------
Loading