Skip to content

Commit 3230728

Browse files
authored
Fix bug for offline inference in scheduler v1 (#3117)
1 parent 583eae2 commit 3230728

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

fastdeploy/engine/engine.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -500,6 +500,7 @@ def add_requests(self, task, sampling_params=None, **kwargs):
500500
enable_thinking = kwargs.get("enable_thinking", None)
501501
request = self.data_processor.process_request(request, self.cfg.max_model_len, enable_thinking=enable_thinking)
502502
request.prompt_token_ids_len = len(request.prompt_token_ids)
503+
request.need_prefill_tokens = request.prompt_token_ids_len
503504
input_ids_len = request.prompt_token_ids_len
504505
request.set(
505506
"max_tokens",

0 commit comments

Comments
 (0)