Replies: 1 comment
-
I had a similar xp. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi There.
I'm using LLaMA 13B Q4_0 with the Python bindings in CPU Mode.
I don't manage to get out good responses like I'm used to when working with the text-generation-webui.
I think I'm doing something wrong.
Thats how I Instantiate the Model:
llm = Llama(model_path="./llama.cpp/models/13B/ggml-model-q4_0.bin", seed = 0, n_ctx = 1200)
And thats how I try to get a response:
output = llm(prompt, max_tokens=64, stop=[Human_Name + ":", "\n"], echo=True)
Technically everything works, but the quality of the response I get is quite off, and no where near I want it to be.
It should be like in chat-mode. Mostly I get out of context responses, sometimes empty responses and sometimes gibberish.
Here's the Full Prompt I use in output = llm():

Can you help me?
Some more Info about the Model:

Response Meta:

Beta Was this translation helpful? Give feedback.
All reactions