feat: add `ImpitHttpClient` http-client client using the `impit` library #1151

Mantisus · 2025-04-14T14:57:53Z

Description

add ImpitHttpClient http-client client using the impit library

Issues

Relates: Integrate impit as HTTP client #1079

Testing

Added tests for ImpitHttpClient. ImpitHttpClient is enabled for all tests using http-client

Mantisus · 2025-04-14T15:03:53Z

For now, I suggest adding impit as an additional dependency, as it still needs some tweaking before it's ready to replace httpx.

Awaiting a decision - apify/impit#123

Copilot

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (1)

pyproject.toml:63

The version requirement for impit in the main dependencies (>=0.1.0) differs from the one in the adaptive-crawler section (>=0.2.0), which may lead to dependency conflicts. Consider aligning them to a consistent version.

"impit>=0.1.0",

Copilot

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Mantisus · 2025-07-07T22:52:24Z

Python binding Impit has all the basic functionality to integrate into Crawlee.

The _get_client method is implemented based on ImpitHttpClient. However, this looks inefficient, especially when working without a proxy, but using a SessionPool of size greater than 1, because the client will be created anew for each request. I think we should improve this on the impit side. @barjin, maybe you'll have some ideas.

Replacing httpx with impit as the main client, I propose to do in a separate PR

tests/unit/http_clients/test_impit.py

Pijukatel · 2025-07-08T07:58:04Z

tests/unit/http_clients/test_impit.py

I see you are following established pattern about adding new test file for the new client, but maybe now is the time to refactor the tests and have only one test file for all clients and parametrize all tests with client.

For example:

@pytest.mark.parametrize("http_client", [ CurlImpersonateHttpClient(http_version=CurlHttpVersion.V1_1), ImpitHttpClient(), HttpxHttpClient(http2=False)]) async def test_http_1(http_client: HttpClient, server_url: URL) -> None: response = await http_client.send_request(str(server_url)) assert response.http_version == 'HTTP/1.1'

Maybe we would need 3xclient factories instead and parametrize by that, but regardless of that I think it would be great to reduce code duplication and ensure that we have exactly the same tests for all clients and that they can all work in exactly the same way.

But this is just a suggestion. Maybe it can be done in separate PR as well to not mix new implementation with pure refactoring.

That's a great idea. But yes, I think it should be done in a separate PR.

Also, we could take the same approach for beautifulsoup and parsel crawlers. Since the tests for them are also completely duplicated.

I will wait for this PR to be merged first, and then we can refactor: #1299

vdusek

LGTM

src/crawlee/http_clients/_impit.py

Mantisus · 2025-07-08T12:29:10Z

Not merging this PR until we resolve the test issue

vdusek

Test issue

[gw1] node down: Not properly terminated
[gw1] [ 98%] FAILED tests/unit/test_service_locator.py::test_storage_client_conflict 
replacing crashed worker gw1
tests/unit/crawlers/_parsel/test_parsel_crawler.py::test_enqueue_links_selector[curl]

Makefile

tests/unit/server.py

vdusek

LGTM

Mantisus added 9 commits April 7, 2025 19:38

add impit in dependencies

cc80b76

add base for client

25cf4ed

update with new release

e098775

update version

dfc5599

fix headers

3bd2dc5

merge master

a04d29a

update tests

bb744c5

set default browser impersionate

d99c74c

docs fix

8b49c2e

Mantisus requested a review from Copilot April 14, 2025 15:03

Copilot AI reviewed Apr 14, 2025

View reviewed changes

fix version in pyproject

75f34d3

Mantisus requested a review from Copilot April 14, 2025 15:29

Copilot AI reviewed Apr 14, 2025

View reviewed changes

Mantisus self-assigned this Apr 15, 2025

Mantisus added 4 commits June 24, 2025 23:04

Merge branch 'master' into impit-client

874036d

Merge branch 'master' into impit-client

cefab77

update client with cookies and stream method

340e3c6

update types

6c1bfcf

Mantisus marked this pull request as ready for review July 7, 2025 22:52

Mantisus requested review from janbuchar, vdusek and Pijukatel July 7, 2025 22:52

Pijukatel reviewed Jul 8, 2025

View reviewed changes

Mantisus added 2 commits July 8, 2025 08:51

add stream for compress data

453185e

del print

c0c761a

Pijukatel approved these changes Jul 8, 2025

View reviewed changes

vdusek approved these changes Jul 8, 2025

View reviewed changes

src/crawlee/http_clients/_impit.py Outdated Show resolved Hide resolved

Update src/crawlee/http_clients/_impit.py

1d72c1c

vdusek requested changes Jul 8, 2025

View reviewed changes

test with break freeze connects

9d4068e

Mantisus force-pushed the impit-client branch from 988157e to 9d4068e Compare July 14, 2025 00:56

Mantisus added 4 commits July 14, 2025 01:07

resolve

b77d904

change config

f3f66f4

add timeout for tests

586aa69

resolve

69ff828

vdusek reviewed Jul 14, 2025

View reviewed changes

Makefile Outdated Show resolved Hide resolved

set separate eventloop policy for uvicorn server

3e0c220

Pijukatel reviewed Jul 15, 2025

View reviewed changes

tests/unit/server.py Show resolved Hide resolved

add comment

bd1da54

Mantisus requested a review from vdusek July 15, 2025 08:41

vdusek approved these changes Jul 15, 2025

View reviewed changes

Pijukatel merged commit 0d0d268 into apify:master Jul 15, 2025
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add `ImpitHttpClient` http-client client using the `impit` library #1151

feat: add `ImpitHttpClient` http-client client using the `impit` library #1151

Uh oh!

Mantisus commented Apr 14, 2025

Uh oh!

Mantisus commented Apr 14, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Mantisus commented Jul 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Pijukatel Jul 8, 2025

Uh oh!

Mantisus Jul 8, 2025

Uh oh!

Pijukatel Jul 8, 2025

Uh oh!

Pijukatel Jul 11, 2025 •

edited

Loading

Uh oh!

vdusek left a comment

Uh oh!

Uh oh!

Mantisus commented Jul 8, 2025

Uh oh!

vdusek left a comment

Uh oh!

Uh oh!

Uh oh!

vdusek left a comment

Uh oh!

Uh oh!

Uh oh!

feat: add ImpitHttpClient http-client client using the impit library #1151

feat: add ImpitHttpClient http-client client using the impit library #1151

Uh oh!

Conversation

Mantisus commented Apr 14, 2025

Description

Issues

Testing

Uh oh!

Mantisus commented Apr 14, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Mantisus commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Pijukatel Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

Mantisus Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

Pijukatel Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

Pijukatel Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vdusek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Mantisus commented Jul 8, 2025

Uh oh!

vdusek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vdusek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

feat: add `ImpitHttpClient` http-client client using the `impit` library #1151

feat: add `ImpitHttpClient` http-client client using the `impit` library #1151

Mantisus commented Jul 7, 2025 •

edited

Loading

Pijukatel Jul 11, 2025 •

edited

Loading