Skip to content

--rate-type=sweep does not load vllm with expected count of requests #273

Answered by sjmonson
psydok asked this question in User Support
Discussion options

You must be logged in to vote

For the row you highlighted in the first screenshot, first keep these things in mind:

  1. constant@0.5 := Send 0.5 requests every second so every 2 seconds send a new request
  2. 25.40 Lat := Each request takes on average 25.4 seconds to complete

Start at the beginning of the test, the number of running requests increases by one every two seconds (1). The first request finishes after 25.4 seconds. At this point, new requests are still being added at a rate of 0.5 per second, while finished requests begin to decrease at the same rate (assuming every request takes the same amount of time). In that first 25.4 seconds 25.4/2 ~= 12 requests were started. Assuming neither the rate of requests nor the…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@psydok
Comment options

@psydok
Comment options

Answer selected by psydok
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #272 on August 11, 2025 21:26.