Skip to content

Commit afa257a

Browse files
authored
Merge pull request #64 from my-dev-app/refactor_lib
Refactor lib
2 parents 4e7d7a5 + f963bd5 commit afa257a

20 files changed

+393
-230
lines changed

README.md

Lines changed: 34 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ By undeƒined
1818

1919
# AProxyRelay: An Async Request Library with Proxy Rotation
2020

21-
AProxyRelay is an asynchronous request library designed for easy data retrieval using various proxy servers. It seamlessly handles proxy rotation, preserves data that fails to be requested, and simplifies API scraping. The library is written in `Python 3.12.1` but is compatible with projects utilizing `Python 3.11.2`.
21+
AProxyRelay is an asynchronous request library designed for easy data retrieval using various proxy servers. It seamlessly handles proxy rotation, preserves data that fails to be requested, and simplifies API scraping. The library is written in `Python 3.12.2`.
2222

2323
In addition, tested proxies will be shared with other people using this library. The more this library is utilized, the bigger the pool of available proxies.
2424

@@ -33,37 +33,60 @@ AProxyRelay streamlines the process of making asynchronous requests with proxy s
3333

3434
### Example
3535
```py
36+
# -*- mode: python ; coding: utf-8 -*-
3637
from aproxyrelay import AProxyRelay
3738

39+
# Note: Duplicates will be removed by the library
3840
targets = [
39-
'https://some-website.com/api/app?id=1551360',
40-
'https://some-website.com/api/app?id=2072450',
41-
'https://some-website.com/api/app?id=1924360',
42-
'https://some-website.com/api/app?id=1707870',
43-
'https://some-website.com/api/app?id=1839880',
41+
'https://gg.my-dev.app/api/v1/proxies/available?zone=US&anonimity=all&protocol=all&page=1&size=100&type=example',
42+
'https://gg.my-dev.app/api/v1/proxies/available?zone=DE&anonimity=all&protocol=all&page=1&size=100&type=example',
43+
'https://gg.my-dev.app/api/v1/proxies/available?zone=NL&anonimity=all&protocol=all&page=1&size=100&type=example',
44+
'https://gg.my-dev.app/api/v1/proxies/available?zone=CA&anonimity=all&protocol=all&page=1&size=100&type=example',
45+
'https://gg.my-dev.app/api/v1/proxies/available?zone=AU&anonimity=all&protocol=all&page=1&size=100&type=example',
4446
]
4547

4648
# Initialize proxy relay
4749
proxy_relay = AProxyRelay(
4850
targets=targets,
4951
timeout=5,
50-
test_proxy=True,
51-
test_timeout=10,
52-
zone='us',
52+
scrape=True,
53+
filter=True,
54+
zones=['us'],
55+
unpack=lambda data, target: data['results'],
56+
debug=False,
5357
)
5458

5559
# Fetch data
5660
data = proxy_relay.start()
5761

5862
# Result Queue
5963
print(data.qsize())
64+
65+
while not data.empty():
66+
content = data.get()
67+
print(content)
68+
6069
```
6170

6271
## A Proxy Relay: Installation
6372
Simply run
6473

6574
pip install aproxyrelay
6675

76+
### Parameters
77+
78+
| Parameters | Type | Function | Description |
79+
|-------------|---------------|------------------------------------------------|--------------------------------------------------------------|
80+
| targets | list[str] | Target endpoints provided in an array | Each endpoint will be requested with an available proxy. If a proxy is unavailable and the request fails, we store it in a queue and try it out with another proxy until we have data. |
81+
| timeout | int | Allowed proxy timeout. **Defaults to 5** | A proxy has to respond within the provided timeout to be considered valid. Otherwise, it will be discarded. |
82+
| scrape | bool | Indicator to utilize the proxy scraper. **Defaults to True** | The decision to scrape for proxies is determined by the value of this parameter. When set to True (default), the proxy scraper is used, which is slower but provides a broader range of proxies. When set to False, proxies are fetched from a single source, offering a faster but more limited selection. |
83+
| filter | bool | Indicator for filtering bad proxies. **Defaults to True** | If set to True (default), the tool will test proxy connections before using them. This process might take a bit longer, but it ensures that the proxies are valid before utilization. |
84+
| zones | list[str] | An array of proxy zones. **Defaults to ['US']** | Sometimes it matters where the proxy is located. Each item in this list ensures the proxy is located in that specific zone, and requests made from the proxy are coming from the location provided. It acts like a whitelist for allowed proxy locations. |
85+
| unpack | lambda | Anonymous function for unpacking data. **Defaults to `lambda data, target: data`** | When a request has been made to a target through a proxy and data has been fetched, this lambda method formats the result data before putting it into the result queue. **data** -> output from the target, **target** -> target URL. |
86+
| debug | bool | Indicator which enables debug mode. **Defaults to False** | When true, additional logging will be printed to the terminal, enabling debug mode. |
87+
88+
89+
6790
## A Proxy Relay: Local Development
6891
To install all library dependencies for local development, excluding the core code available locally, use the following command within a virtual environment:
6992

@@ -89,10 +112,12 @@ from .core import ScraperCore
89112
class MainScraper(ScraperCore):
90113
def __init__(self) -> None:
91114
ScraperCore.__init__(self)
115+
self.zone = None
92116

93117
@classmethod
94118
async def format_url(cls, url, *args, **kwargs) -> str:
95119
"""Formats URL before scraping, let us adjust query parameters for each parser"""
120+
cls.zone = kwargs.get("zone", "us")
96121
new_url = f'{url}'
97122
return new_url
98123

aproxyrelay/__init__.py

Lines changed: 47 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,12 @@
1212
Automatically rotates bad proxy servers, preserves data which failed to request.
1313
Makes scraping API's easy and fun.
1414
"""
15-
import asyncio
15+
from asyncio import get_event_loop, gather
1616
from datetime import datetime, UTC
17+
from logging import basicConfig, INFO, DEBUG, getLogger
18+
from typing import Callable
1719
from queue import Queue
1820

19-
import logging
20-
import sys
21-
2221
from .core import AProxyRelayCore
2322

2423

@@ -27,40 +26,57 @@ def __init__(
2726
self,
2827
targets: list[str],
2928
timeout: int = 5,
30-
test_proxy: bool = True,
31-
test_timeout: int = 20,
32-
zone: str = 'us',
29+
scrape: bool = True,
30+
filter: bool = True,
31+
zones: list[str] = ['US'], # noqa: B006
32+
unpack: Callable = lambda data, target: data,
3333
debug: bool = False,
34-
steam: bool = False
3534
) -> None:
3635
"""
3736
Initialize an instance of AProxyRelay.
3837
3938
Args:
40-
targets (list[str]): Target URL's to obtain data from.
41-
timeout (int): Amount of time in seconds before a connection is cancelled if not succeeded.
42-
test_proxy (bool): When True, test proxy connections before utilizing them.
43-
test_timeout (int): Timeout for testing proxy connections in seconds.
44-
zone (str): Zone identifier, e.g., 'us', 'nl', 'de', 'uk', etc etc.
45-
debug (bool): Enable debug mode if True.
46-
steam (bool): Enable Steam mode if True.
39+
targets: list[str]: Target URL's to obtain data from.
40+
timeout: int: Amount of time in seconds before a connection is cancelled if not succeeded.
41+
scrape: bool: When True, scrape for proxies (Slow). Otherwise fetch them from one source (Fast).
42+
filter: bool: When True, test proxy connections before utilizing them.
43+
zone: list[str]: List of whitelisted proxy zones. Only use proxies located in the provided array.
44+
unpack: Callable: Filter extracted data through an anonymous method.
45+
debug: bool: When True, ouput debug logs to terminal.
46+
47+
Example:
48+
```py
49+
proxy_relay = AProxyRelay(
50+
targets=targets,
51+
timeout=5,
52+
scrape=True,
53+
filter=True,
54+
zones=['US', 'DE'],
55+
unpack=lambda data, target: data[target.split('appids=')[1]]['success'],
56+
debug=True,
57+
)
58+
```
4759
"""
4860
# Configure the logger
49-
logging.basicConfig(level=logging.INFO if not debug else logging.DEBUG)
50-
self.logger = logging.getLogger(__name__)
51-
52-
# TODO raise exceptions
53-
self.timeout = timeout
54-
self.test_timeout = test_timeout
55-
self.test_proxy = test_proxy
56-
self.zone = zone.upper()
57-
self.debug = debug
58-
self._steam = steam
61+
basicConfig(level=INFO if not debug else DEBUG)
62+
self.logger = getLogger(__name__)
5963

64+
# Initialize Core
6065
AProxyRelayCore.__init__(self)
66+
67+
# TODO raise exceptions for class arguments
68+
self._queue_target_process = Queue(maxsize=len(targets))
6169
for item in list(set(targets)):
6270
self._queue_target_process.put(item)
6371

72+
self.timeout = timeout
73+
self.scrape = scrape
74+
self.filter = filter
75+
self.zones = [z.upper() for z in zones]
76+
self.unpack = unpack
77+
self.debug = debug
78+
self.started = None
79+
6480
async def _main(self) -> Queue:
6581
"""
6682
Start the scrape task asynchronously. Once finished, you will end up with the data from the API in a Queue.
@@ -82,25 +98,14 @@ def start(self) -> Queue:
8298
Returns:
8399
Queue: A queue containing the scraped data from the API.
84100
"""
85-
started = datetime.now(UTC)
86-
self.logger.info(f'Started proxy relay at {started} ... Please wait ...!')
101+
self.started = datetime.now(UTC)
102+
self.logger.info(f'Started proxy relay at {self.started} ... Please wait ...!')
87103

88-
if sys.platform == "win32":
89-
loop = asyncio.ProactorEventLoop()
90-
else:
91-
loop = asyncio.SelectorEventLoop()
104+
loop = get_event_loop()
92105
loop.set_debug(self.debug)
106+
results = loop.run_until_complete(gather(self._main()))
107+
result = results.pop()
93108

94-
try:
95-
# Create a task and set its name
96-
task = loop.create_task(self._main())
97-
task.set_name("AProxyRelay")
98-
99-
loop.run_until_complete(task)
100-
self.logger.info(f'Data scraped! Took {datetime.now(UTC) - started}, enjoy!')
101-
102-
result = task.result()
103-
finally:
104-
loop.close()
109+
self.logger.info(f'Data scraped! Took {datetime.now(UTC) - self.started}, enjoy!')
105110

106111
return result

0 commit comments

Comments
 (0)