Endpoints created with custom image using create_inference_endpoint from huggingface_hub returns inconsistent and hallucinatory outputs

### Issue Description

**Context:**
I am using the Hugging Face Inference API to deploy models on a leaderboard using the `huggingface_hub` library with a custom image for text generation. Everything was working fine until recently. Now, I am experiencing issues where the model either hallucinates or the inference API returns a sequence of random words in the "generated_text" field.

### Problem Description

1. **Hallucination or Random Words:**
   - When using a custom image, the model returns hallucinatory or random text.
   - Without the custom image, the output is more reasonable but still inconsistent.

2. **Inconsistency Between `requests` and `InferenceClient`:**
   - There is an inconsistency in the output when using the `requests` library and the `InferenceClient`.

### Code to Reproduce the Error

#### Setting Up the Endpoint

```python
from huggingface_hub import create_inference_endpoint

endpoint = create_inference_endpoint(
    "test-qwenz",
    repository="Qwen/Qwen2.5-0.5B-Instruct",
    framework="pytorch",
    namespace="<xxxxxxxx>",
    task="text-generation",
    accelerator="gpu",
    vendor="aws",
    region="us-east-1",
    type="protected",
    instance_size="x1",
    instance_type="nvidia-a10g",
    custom_image={
        "health_route": "/health",
        "env": {
            "MAX_BATCH_PREFILL_TOKENS": "2048",
            "MAX_INPUT_LENGTH": "1024",
            "MAX_TOTAL_TOKENS": "1512",
            "MODEL_ID": "/repository"
        },
        "url": "ghcr.io/huggingface/text-generation-inference:latest",
    },
)
```

#### Test Code Using `requests`

```python
import requests

# Define the input data
data = {
    "inputs": "What is 4*2 ?"
}

# Define the headers, including the Authorization token
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

# Send the POST request
response = requests.post(endpoint_url, json=data, headers=headers)

# Print the response
print(response.json())
```

#### Output with Custom Image

```python
[
    {
        'generated_text': 'What is 4*2 ? The integral is not 0 for ν ≤ 3/2. Any such τ will allow us to express (7b) in a form of a quadratic function of the twists τ x : Z z → Z y , which turns out to be a constant ...'
    }
]
```

#### Output without Custom Image

```python
[
    {
        'generated_text': 'What is 4*2 ? To solve the problem of multiplying two numbers, you simply multiply them together. In this case, we...'
    }
]
```

#### Inconsistency Between `requests` and `InferenceClient`

```python
from huggingface_hub import InferenceClient

client = InferenceClient(endpoint_url, token="<token>")

output = client.text_generation(
    prompt="What is 4*2 ?", details=True
)

print(output)
```

Output:

```python
' - Brainly.com\\nprofile\\njessie1078\\njessie10'
```

### Additional Information

- **Models Tested:**
  - Phi4
  - Qwen7B-Instruct

- **Custom Image URLs Tested:**
  - `ghcr.io/huggingface/text-generation-inference:1.1.1`
  - `ghcr.io/huggingface/text-generation-inference:3.3.1`
  - `ghcr.io/huggingface/text-generation-inference:latest`

### Expected Behavior

- The model should return a consistent and accurate response to the input prompt.
- There should be no significant difference in the output when using `requests` and `InferenceClient`.

### Environment

- **Python Version:** 3.10.16
- **huggingface_hub Version:** 0.27.1 (tested also on the 0.33.1)
- **requests Version:** 2.32.3

### Steps to Reproduce

1. Set up the inference endpoint with the provided code.
2. Use the `requests` library to send a POST request to the endpoint.
3. Use the `InferenceClient` to send a request to the endpoint.
4. Observe the inconsistent and hallucinatory outputs.

### Additional Notes

- The issue persists across different models and custom image versions (Qwen-1.5B-Instruct, Qwen-7B-Instruct, Phi4).
- The problem started recently, suggesting a potential change in the inference API or custom image configuration.

### Request

Could you please investigate this issue and provide a solution or workaround to ensure consistent and accurate model outputs?

Thank you for your attention and support.

### System info

```shell
- **Python Version:** 3.10.16
- **huggingface_hub Version:** 0.27.1 (tested also on the 0.33.1)
- **requests Version:** 2.32.3
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Endpoints created with custom image using create_inference_endpoint from huggingface_hub returns inconsistent and hallucinatory outputs #3184

Issue Description

Problem Description

Code to Reproduce the Error

Setting Up the Endpoint

Test Code Using `requests`

Output with Custom Image

Output without Custom Image

Inconsistency Between `requests` and `InferenceClient`

Additional Information

Expected Behavior

Environment

Steps to Reproduce

Additional Notes

Request

System info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Endpoints created with custom image using create_inference_endpoint from huggingface_hub returns inconsistent and hallucinatory outputs #3184

Description

Issue Description

Problem Description

Code to Reproduce the Error

Setting Up the Endpoint

Test Code Using requests

Output with Custom Image

Output without Custom Image

Inconsistency Between requests and InferenceClient

Additional Information

Expected Behavior

Environment

Steps to Reproduce

Additional Notes

Request

System info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Test Code Using `requests`

Inconsistency Between `requests` and `InferenceClient`