Replies: 3 comments
-
If you guys need me to learn how to use github, and figure out how to submit a proper PR, and then research how exactly to properly implement the solution I'm proposing and actually code it etc, I'm willing to give it a shot because I think the quality of life improvement is worth it. But it's going to take me a cool minute as I'm not a coder, I'm more of a dabbler. |
Beta Was this translation helpful? Give feedback.
-
Yeah, this would be very useful! I've run into this problem earlier this week when testing speculative decoding and accidentally had OpenWebUI messing with my command-line settings without me realising. |
Beta Was this translation helpful? Give feedback.
-
fwiw: I added support for stripping sampling params into llama-swap as I was also waiting on this feature. I was going to PR it here but after looking through llama-server’s code it had a bit of a code smell to put it in there. I think it fits better in a layer outside of llama-server. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Would it be possible to add a flag to llama-server to ignore any samplers sent by the client and to use the samplers you set when running llama-server? Something like --ignore-client-samplers 1.
Alternatively, maybe the default GUI can have a setting to avoid sending the samplers (and sampler order) to the server? Although I would prefer the first option as it's the most flexible and is client-agnostic and would work for any client that uses the API but sends its own samplers to the model (even when you don't want it to).
I ask because every model lately comes with developer-recommended sampler settings. So I have to tweak them on the client for every model I run as they override the sever-side settings in the server's launch parameters. It would be nice to not have to worry about it, let the server control it (optionally) just like it currently controls the prompt template from gguf embeddings. Except default behavior for samplers should be as it is now with the option to ignore them at launch.
This also makes it difficult to let other non-tech people play with local models that I serve for them using the default web client because they don't know how/why to change samplers etc.
I'm not a coder and I've never submitted a PR (I don't even know how to git lol), but the code is here:
llama.cpp/tools/server/server.cpp
Line 253 in 2baf077
You would take all of these lines::
params.sampling.top_k = json_value(data, "top_k", defaults.sampling.top_k);
And essentially change them to:
params.sampling.top_k = defaults.sampling.top_k;
Including the sampler order:
params.sampling.samplers = defaults.sampling.samplers;
Except this would be unconditionally ignoring the client entirely. So the proper solution would be to create a flag/parameter and then put the above into an IF statement, depending on which way the flag is set when running llama-server. Default behavior would be as it is now, but with flag, just ignore the json_value.
I could also be entirely wrong about the code part :D
Beta Was this translation helpful? Give feedback.
All reactions