-
Notifications
You must be signed in to change notification settings - Fork 765
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Issue Description
Context:
I am using the Hugging Face Inference API to deploy models on a leaderboard using the huggingface_hub
library with a custom image for text generation. Everything was working fine until recently. Now, I am experiencing issues where the model either hallucinates or the inference API returns a sequence of random words in the "generated_text" field.
Problem Description
-
Hallucination or Random Words:
- When using a custom image, the model returns hallucinatory or random text.
- Without the custom image, the output is more reasonable but still inconsistent.
-
Inconsistency Between
requests
andInferenceClient
:- There is an inconsistency in the output when using the
requests
library and theInferenceClient
.
- There is an inconsistency in the output when using the
Code to Reproduce the Error
Setting Up the Endpoint
from huggingface_hub import create_inference_endpoint
endpoint = create_inference_endpoint(
"test-qwenz",
repository="Qwen/Qwen2.5-0.5B-Instruct",
framework="pytorch",
namespace="<xxxxxxxx>",
task="text-generation",
accelerator="gpu",
vendor="aws",
region="us-east-1",
type="protected",
instance_size="x1",
instance_type="nvidia-a10g",
custom_image={
"health_route": "/health",
"env": {
"MAX_BATCH_PREFILL_TOKENS": "2048",
"MAX_INPUT_LENGTH": "1024",
"MAX_TOTAL_TOKENS": "1512",
"MODEL_ID": "/repository"
},
"url": "ghcr.io/huggingface/text-generation-inference:latest",
},
)
Test Code Using requests
import requests
# Define the input data
data = {
"inputs": "What is 4*2 ?"
}
# Define the headers, including the Authorization token
headers = {
"Authorization": "Bearer <token>",
"Content-Type": "application/json"
}
# Send the POST request
response = requests.post(endpoint_url, json=data, headers=headers)
# Print the response
print(response.json())
Output with Custom Image
[
{
'generated_text': 'What is 4*2 ? The integral is not 0 for ν ≤ 3/2. Any such τ will allow us to express (7b) in a form of a quadratic function of the twists τ x : Z z → Z y , which turns out to be a constant ...'
}
]
Output without Custom Image
[
{
'generated_text': 'What is 4*2 ? To solve the problem of multiplying two numbers, you simply multiply them together. In this case, we...'
}
]
Inconsistency Between requests
and InferenceClient
from huggingface_hub import InferenceClient
client = InferenceClient(endpoint_url, token="<token>")
output = client.text_generation(
prompt="What is 4*2 ?", details=True
)
print(output)
Output:
' - Brainly.com\\nprofile\\njessie1078\\njessie10'
Additional Information
-
Models Tested:
- Phi4
- Qwen7B-Instruct
-
Custom Image URLs Tested:
ghcr.io/huggingface/text-generation-inference:1.1.1
ghcr.io/huggingface/text-generation-inference:3.3.1
ghcr.io/huggingface/text-generation-inference:latest
Expected Behavior
- The model should return a consistent and accurate response to the input prompt.
- There should be no significant difference in the output when using
requests
andInferenceClient
.
Environment
- Python Version: 3.10.16
- huggingface_hub Version: 0.27.1 (tested also on the 0.33.1)
- requests Version: 2.32.3
Steps to Reproduce
- Set up the inference endpoint with the provided code.
- Use the
requests
library to send a POST request to the endpoint. - Use the
InferenceClient
to send a request to the endpoint. - Observe the inconsistent and hallucinatory outputs.
Additional Notes
- The issue persists across different models and custom image versions (Qwen-1.5B-Instruct, Qwen-7B-Instruct, Phi4).
- The problem started recently, suggesting a potential change in the inference API or custom image configuration.
Request
Could you please investigate this issue and provide a solution or workaround to ensure consistent and accurate model outputs?
Thank you for your attention and support.
System info
- **Python Version:** 3.10.16
- **huggingface_hub Version:** 0.27.1 (tested also on the 0.33.1)
- **requests Version:** 2.32.3
hanouticelina
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working