-
Notifications
You must be signed in to change notification settings - Fork 13
chore: Add test server and some top level crawler tests #517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 5 commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
c3d72eb
Add test server and some top level Crawler tests
Pijukatel 3a6afee
Add max retries test
Pijukatel be6935d
Update retry test
Pijukatel 45d839a
Handle format and mypy
Pijukatel ffdc61a
Add uvicorn requirement
Pijukatel 89ec124
Fix CI errors
Pijukatel 346da1e
Move uvicorn to dev dependencies
Pijukatel 37cc555
Merge remote-tracking branch 'origin/master' into crawler-tests
Pijukatel ad8f5be
crawlee[parsel] to dev dependencies
Pijukatel 904f566
Update requirements.txt
Pijukatel 6272200
Update the test and use latest crawlee 0.6.12
Pijukatel c618ab6
Merge remote-tracking branch 'origin/master' into crawler-tests
Pijukatel f563d87
Unpin versions from doceker file requirements.txt
Pijukatel 470c3bd
Update requirements.txt
Pijukatel File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
""" | ||
Test server is infinite server http://localhost:8080/{any_number} and each page has links to the next 10 pages. | ||
For example: | ||
http://localhost:8080/ contains links: | ||
http://localhost:8080/0, http://localhost:8080/1, ..., http://localhost:8080/9 | ||
|
||
http://localhost:8080/1 contains links: | ||
http://localhost:8080/10, http://localhost:8080/11, ..., http://localhost:8080/19 | ||
|
||
... and so on. | ||
""" | ||
|
||
import asyncio | ||
import logging | ||
from collections.abc import Awaitable, Callable, Coroutine | ||
from socket import socket | ||
from typing import Any | ||
|
||
from uvicorn import Config | ||
from uvicorn.server import Server | ||
from yarl import URL | ||
|
||
Receive = Callable[[], Awaitable[dict[str, Any]]] | ||
Send = Callable[[dict[str, Any]], Coroutine[None, None, None]] | ||
|
||
|
||
async def send_html_response(send: Send, html_content: bytes, status: int = 200) -> None: | ||
"""Send an HTML response to the client.""" | ||
await send( | ||
{ | ||
'type': 'http.response.start', | ||
'status': status, | ||
'headers': [[b'content-type', b'text/html; charset=utf-8']], | ||
} | ||
) | ||
await send({'type': 'http.response.body', 'body': html_content}) | ||
|
||
|
||
async def app(scope: dict[str, Any], _: Receive, send: Send) -> None: | ||
"""Main ASGI application handler that routes requests to specific handlers. | ||
|
||
Args: | ||
scope: The ASGI connection scope. | ||
_: The ASGI receive function. | ||
send: The ASGI send function. | ||
""" | ||
assert scope['type'] == 'http' | ||
path = scope['path'] | ||
|
||
links = '\n'.join(f'<a href="{path}{i}">{path}{i}</a>' for i in range(10)) | ||
await send_html_response( | ||
send, | ||
f"""\ | ||
<html><head> | ||
<title>Title for {path} </title> | ||
</head> | ||
<body> | ||
{links} | ||
</body></html>""".encode(), | ||
) | ||
|
||
|
||
class TestServer(Server): | ||
"""A test HTTP server implementation based on Uvicorn Server.""" | ||
|
||
@property | ||
def url(self) -> URL: | ||
"""Get the base URL of the server. | ||
|
||
Returns: | ||
A URL instance with the server's base URL. | ||
""" | ||
protocol = 'https' if self.config.is_ssl else 'http' | ||
return URL(f'{protocol}://{self.config.host}:{self.config.port}/') | ||
|
||
async def serve(self, sockets: list[socket] | None = None) -> None: | ||
"""Run the server.""" | ||
if sockets: | ||
raise RuntimeError('Simple TestServer does not support custom sockets') | ||
self.restart_requested = asyncio.Event() | ||
|
||
loop = asyncio.get_event_loop() | ||
tasks = { | ||
loop.create_task(super().serve()), | ||
} | ||
await asyncio.wait(tasks) | ||
|
||
|
||
if __name__ == '__main__': | ||
asyncio.run( | ||
TestServer( | ||
config=Config( | ||
app=app, | ||
lifespan='off', | ||
loop='asyncio', | ||
port=8080, | ||
log_config=None, | ||
log_level=logging.CRITICAL, | ||
) | ||
).serve() | ||
) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
from __future__ import annotations | ||
|
||
from typing import TYPE_CHECKING | ||
|
||
if TYPE_CHECKING: | ||
from .conftest import MakeActorFunction, RunActorFunction | ||
|
||
|
||
async def test_actor_on_platform_max_crawl_depth( | ||
make_actor: MakeActorFunction, | ||
run_actor: RunActorFunction, | ||
) -> None: | ||
"""Test that the actor respects max_crawl_depth.""" | ||
|
||
async def main() -> None: | ||
"""The crawler entry point.""" | ||
import re | ||
|
||
from crawlee.crawlers import ParselCrawler, ParselCrawlingContext | ||
|
||
from apify import Actor | ||
|
||
async with Actor: | ||
crawler = ParselCrawler(max_crawl_depth=2) | ||
finished = [] | ||
enqueue_pattern = re.compile(r'http://localhost:8080/2+$') | ||
|
||
@crawler.router.default_handler | ||
async def default_handler(context: ParselCrawlingContext) -> None: | ||
"""Default request handler.""" | ||
context.log.info(f'Processing {context.request.url} ...') | ||
await context.enqueue_links(include=[enqueue_pattern]) | ||
finished.append(context.request.url) | ||
|
||
await crawler.run(['http://localhost:8080/']) | ||
assert finished == ['http://localhost:8080/', 'http://localhost:8080/2', 'http://localhost:8080/22'] | ||
|
||
actor = await make_actor(label='crawler-max-depth', main_func=main) | ||
run_result = await run_actor(actor) | ||
|
||
assert run_result.status == 'SUCCEEDED' | ||
|
||
|
||
async def test_actor_on_platform_max_requests_per_crawl( | ||
make_actor: MakeActorFunction, | ||
run_actor: RunActorFunction, | ||
) -> None: | ||
"""Test that the actor respects max_requests_per_crawl.""" | ||
|
||
async def main() -> None: | ||
"""The crawler entry point.""" | ||
from crawlee import ConcurrencySettings | ||
from crawlee.crawlers import ParselCrawler, ParselCrawlingContext | ||
|
||
from apify import Actor | ||
|
||
async with Actor: | ||
crawler = ParselCrawler( | ||
max_requests_per_crawl=3, concurrency_settings=ConcurrencySettings(max_concurrency=1) | ||
) | ||
finished = [] | ||
|
||
@crawler.router.default_handler | ||
async def default_handler(context: ParselCrawlingContext) -> None: | ||
"""Default request handler.""" | ||
context.log.info(f'Processing {context.request.url} ...') | ||
await context.enqueue_links() | ||
finished.append(context.request.url) | ||
|
||
await crawler.run(['http://localhost:8080/']) | ||
assert len(finished) == 3 | ||
|
||
actor = await make_actor(label='crawler-max-requests', main_func=main) | ||
run_result = await run_actor(actor) | ||
|
||
assert run_result.status == 'SUCCEEDED' | ||
|
||
|
||
async def test_actor_on_platform_max_request_retries( | ||
make_actor: MakeActorFunction, | ||
run_actor: RunActorFunction, | ||
) -> None: | ||
"""Test that the actor respects max_request_retries.""" | ||
|
||
async def main() -> None: | ||
"""The crawler entry point.""" | ||
from crawlee.crawlers import BasicCrawlingContext, ParselCrawler, ParselCrawlingContext | ||
|
||
from apify import Actor | ||
|
||
async with Actor: | ||
max_retries = 3 | ||
crawler = ParselCrawler(max_request_retries=max_retries) | ||
failed_counter = 0 | ||
|
||
@crawler.error_handler | ||
async def failed_handler(_: BasicCrawlingContext, __: Exception) -> None: | ||
nonlocal failed_counter | ||
failed_counter += 1 | ||
|
||
@crawler.router.default_handler | ||
async def default_handler(_: ParselCrawlingContext) -> None: | ||
raise RuntimeError('Some error') | ||
|
||
await crawler.run(['http://localhost:8080/']) | ||
# https://github.com/apify/crawlee-python/issues/1326 , should be max_retries + 1 | ||
assert failed_counter == max_retries, f'{failed_counter=}' | ||
|
||
actor = await make_actor(label='crawler-max-retries', main_func=main) | ||
run_result = await run_actor(actor) | ||
|
||
assert run_result.status == 'SUCCEEDED' |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.