Why is my latency getting worse over time? #14024
-
Beta Was this translation helpful? Give feedback.
Answered by
ExtReMLapin
Jun 5, 2025
Replies: 2 comments 1 reply
-
As I am working on one request after the other, I have adapted the deployment. With --ctx-size 8192 and --parallel 1, I get stable 9-10 sec Latencies, nevertheless, I would be interested in where the problem with my initial deployment comes from. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Cause : #10860 |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
besrym
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Cause : #10860