Skip to content

Commit bd4b940

Browse files
authored
docs: improve max_request_retries and max_session_rotations documentation (#1192)
1 parent 9a63edf commit bd4b940

File tree

1 file changed

+16
-3
lines changed

1 file changed

+16
-3
lines changed

src/crawlee/crawlers/_basic/_basic_crawler.py

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,11 @@ class _BasicCrawlerOptions(TypedDict):
110110
"""HTTP client used by `BasicCrawlingContext.send_request` method."""
111111

112112
max_request_retries: NotRequired[int]
113-
"""Maximum number of attempts to process a single request."""
113+
"""Specifies the maximum number of retries allowed for a request if its processing fails.
114+
This includes retries due to navigation errors or errors thrown from user-supplied functions
115+
(`request_handler`, `pre_navigation_hooks` etc.).
116+
117+
This limit does not apply to retries triggered by session rotation (see `max_session_rotations`)."""
114118

115119
max_requests_per_crawl: NotRequired[int | None]
116120
"""Maximum number of pages to open during a crawl. The crawl stops upon reaching this limit.
@@ -119,7 +123,10 @@ class _BasicCrawlerOptions(TypedDict):
119123

120124
max_session_rotations: NotRequired[int]
121125
"""Maximum number of session rotations per request. The crawler rotates the session if a proxy error occurs
122-
or if the website blocks the request."""
126+
or if the website blocks the request.
127+
128+
The session rotations are not counted towards the `max_request_retries` limit.
129+
"""
123130

124131
max_crawl_depth: NotRequired[int | None]
125132
"""Specifies the maximum crawl depth. If set, the crawler will stop processing links beyond this depth.
@@ -269,14 +276,20 @@ def __init__(
269276
proxy_configuration: HTTP proxy configuration used when making requests.
270277
http_client: HTTP client used by `BasicCrawlingContext.send_request` method.
271278
request_handler: A callable responsible for handling requests.
272-
max_request_retries: Maximum number of attempts to process a single request.
279+
max_request_retries: Specifies the maximum number of retries allowed for a request if its processing fails.
280+
This includes retries due to navigation errors or errors thrown from user-supplied functions
281+
(`request_handler`, `pre_navigation_hooks` etc.).
282+
283+
This limit does not apply to retries triggered by session rotation (see `max_session_rotations`).
273284
max_requests_per_crawl: Maximum number of pages to open during a crawl. The crawl stops upon reaching
274285
this limit. Setting this value can help avoid infinite loops in misconfigured crawlers. `None` means
275286
no limit. Due to concurrency settings, the actual number of pages visited may slightly exceed
276287
this value. If used together with `keep_alive`, then the crawler will be kept alive only until
277288
`max_requests_per_crawl` is achieved.
278289
max_session_rotations: Maximum number of session rotations per request. The crawler rotates the session
279290
if a proxy error occurs or if the website blocks the request.
291+
292+
The session rotations are not counted towards the `max_request_retries` limit.
280293
max_crawl_depth: Specifies the maximum crawl depth. If set, the crawler will stop processing links beyond
281294
this depth. The crawl depth starts at 0 for initial requests and increases with each subsequent level
282295
of links. Requests at the maximum depth will still be processed, but no new links will be enqueued

0 commit comments

Comments
 (0)