Enable non-streaming mode in `transformers serve` #41446

LysandreJik · 2025-10-08T11:53:19Z

Needs this to be merged first: #41444

Tests and docs need to be added before undraft

HuggingFaceDocBuilderDev · 2025-10-08T12:02:25Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Remove typos Remove typos Remove typos

LysandreJik · 2025-10-14T08:29:52Z

Test need a bit of a refactor, which will land once we merge @Wauplin's PR here: #41487

ArthurZucker

I am missing a lot of context, but let's make sure funcs are not nested if possible, and func names are helpful + document why we need themn

ArthurZucker · 2025-10-14T11:05:59Z

src/transformers/commands/serving.py

 from huggingface_hub import model_info
 from huggingface_hub.constants import HF_HUB_OFFLINE
+from openai.types.chat.chat_completion import Choice
+from starlette.responses import StreamingResponse


IDK what starlette is, prob a new deps?

(also do we really need it?)

I moved the import from starlette to the one from FastAPI -> the FastAPI one is a re-export of the starlette one, but it's more coherent from an import shielding perspective

ArthurZucker · 2025-10-14T11:07:48Z

src/transformers/commands/serving.py

-        return stream_chat_completion(generation_streamer, request_id)
+        if req.get("stream"):
+
+            def sse(_generator):


not a very helpful func name

Correct, replaced it by map so that it's cleaner

ArthurZucker · 2025-10-14T11:08:46Z

src/transformers/modeling_utils.py

                load_and_register_attn_kernel(applicable_attn_implementation)
                # log that we used kernel fallback if successful
-                if attn_implementation.startswith("flash_attention"):
+                if attn_implementation.startswith("flash_"):


we should check "flash_" in attn_implementation" because if we fallback / use a kernel, it starts with kernel_community/....

LysandreJik · 2025-10-14T12:46:11Z

I cleaned it up a bit @ArthurZucker; IMO we might want to take a look at simplifying the stream/non-stream path in the future, depending on how the tool handling is done (still needs to be implemented for non-streaming generate and everything CB).

ArthurZucker

thanks!

* Enable non-streaming in transformers serve Remove typos Remove typos Remove typos * Fix tests * Arthur review

Base automatically changed from streaming-requests-cb to main October 10, 2025 08:24

LysandreJik force-pushed the serving-lighteval branch from 21cf126 to 5a99194 Compare October 10, 2025 08:36

LysandreJik marked this pull request as ready for review October 14, 2025 08:27

github-actions bot requested review from ArthurZucker and Rocketknight1 October 14, 2025 08:27

LysandreJik added 2 commits October 14, 2025 10:28

Enable non-streaming in transformers serve

baf187d

Remove typos Remove typos Remove typos

Fix tests

dca9e05

LysandreJik force-pushed the serving-lighteval branch from af65c33 to dca9e05 Compare October 14, 2025 08:28

ArthurZucker approved these changes Oct 14, 2025

View reviewed changes

Arthur review

a0c6b40

ArthurZucker approved these changes Oct 14, 2025

View reviewed changes

LysandreJik mentioned this pull request Oct 14, 2025

Migrate transformers cli to Typer #41487

Open

5 tasks

LysandreJik merged commit 13a35a5 into main Oct 15, 2025
26 checks passed

LysandreJik deleted the serving-lighteval branch October 15, 2025 07:37

i3hz pushed a commit to i3hz/transformers that referenced this pull request Oct 15, 2025

Enable non-streaming mode in transformers serve (huggingface#41446)

7c9254a

* Enable non-streaming in transformers serve Remove typos Remove typos Remove typos * Fix tests * Arthur review

Enable non-streaming mode in transformers serve #41446

Enable non-streaming mode in transformers serve #41446

Conversation

LysandreJik commented Oct 8, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 8, 2025

Uh oh!

LysandreJik commented Oct 14, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

LysandreJik Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

LysandreJik Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

LysandreJik Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

LysandreJik commented Oct 14, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Enable non-streaming mode in `transformers serve` #41446

Enable non-streaming mode in `transformers serve` #41446