Skip to content

AgentQnA: update instructions and env variable names for remote endpoints #2113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 11, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 16 additions & 10 deletions AgentQnA/docker_compose/intel/cpu/xeon/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,11 +52,11 @@ export no_proxy=localhost,127.0.0.1,$host_ip # additional no proxies if needed
export NGINX_PORT=${your_nginx_port} # your usable port for nginx, 80 for example
```

#### [Optional] OPENAI_API_KEY to use OpenAI models or Intel® AI for Enterprise Inference
#### [Optional] OPENAI_API_KEY to use OpenAI models or LLM models with remote endpoints

To use OpenAI models, generate a key following these [instructions](https://platform.openai.com/api-keys).

To use a remote server running Intel® AI for Enterprise Inference, contact the cloud service provider or owner of the on-prem machine for a key to access the desired model on the server.
When models are deployed on a remote server, a base URL and an API key are required to access them. To set up a remote server and acquire the base URL and API key, refer to [Intel® AI for Enterprise Inference](https://www.intel.com/content/www/us/en/developer/topic-technology/artificial-intelligence/enterprise-inference.html) offerings.

Then set the environment variable `OPENAI_API_KEY` with the key contents:

Expand All @@ -74,7 +74,7 @@ source $WORKDIR/GenAIExamples/AgentQnA/docker_compose/intel/cpu/xeon/set_env.sh

We make it convenient to launch the whole system with docker compose, which includes microservices for LLM, agents, UI, retrieval tool, vector database, dataprep, and telemetry. There are 3 docker compose files, which make it easy for users to pick and choose. Users can choose a different retrieval tool other than the `DocIndexRetriever` example provided in our GenAIExamples repo. Users can choose not to launch the telemetry containers.

On Xeon, OpenAI models and models deployed on a remote server are supported. Both methods require an API key.
On Xeon, OpenAI models and models deployed on a remote server are supported. Both methods require an API key where `OPENAI_API_KEY` needs to be set in the [previous step](#optional-openai_api_key-to-use-openai-models-or-llm-models-with-remote-endpoints).

```bash
cd $WORKDIR/GenAIExamples/AgentQnA/docker_compose/intel/cpu/xeon
Expand All @@ -88,19 +88,25 @@ The command below will launch the multi-agent system with the `DocIndexRetriever
docker compose -f $WORKDIR/GenAIExamples/DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml -f compose_openai.yaml up -d
```

#### Models on Remote Server
#### Models on Remote Servers

When models are deployed on a remote server with Intel® AI for Enterprise Inference, a base URL and an API key are required to access them. To run the Agent microservice on Xeon while using models deployed on a remote server, add `compose_remote.yaml` to the `docker compose` command and set additional environment variables.

#### Notes
> **Note**: For AgentQnA, the minimum hardware requirement for the remote server is Intel® Gaudi® AI Accelerators.

- `OPENAI_API_KEY` is already set in a previous step.
- `model` is used to overwrite the value set for this environment variable in `set_env.sh`.
- `LLM_ENDPOINT_URL` is the base URL given from the owner of the on-prem machine or cloud service provider. It will follow this format: "https://<DNS>". Here is an example: "https://api.inference.example.com".
Set the following environment variables.

- `REMOTE_ENDPOINT` is the HTTPS endpoint of the remote server with the model of choice (i.e. https://api.example.com). **Note:** If the API for the models does not use LiteLLM, the second part of the model card needs to be appended to the URL. For example, set `REMOTE_ENDPOINT` to https://api.example.com/Llama-3.3-70B-Instruct if the model card is `meta-llama/Llama-3.3-70B-Instruct`.
- `model` is the model card which may need to be overwritten depending on what it is set to `set_env.sh`.

```bash
export REMOTE_ENDPOINT=<https-endpoint-of-remote-server>
export model=<model-card>
```

After setting these environment variables, run `docker compose` by adding `compose_remote.yaml` as an additional YAML file:

```bash
export model=<name-of-model-card>
export LLM_ENDPOINT_URL=<http-endpoint-of-remote-server>
docker compose -f $WORKDIR/GenAIExamples/DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml -f compose_openai.yaml -f compose_remote.yaml up -d
```

Expand Down
6 changes: 3 additions & 3 deletions AgentQnA/docker_compose/intel/cpu/xeon/compose_remote.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@
services:
worker-rag-agent:
environment:
llm_endpoint_url: ${LLM_ENDPOINT_URL}
llm_endpoint_url: ${REMOTE_ENDPOINT}
api_key: ${OPENAI_API_KEY}

worker-sql-agent:
environment:
llm_endpoint_url: ${LLM_ENDPOINT_URL}
llm_endpoint_url: ${REMOTE_ENDPOINT}
api_key: ${OPENAI_API_KEY}

supervisor-react-agent:
environment:
llm_endpoint_url: ${LLM_ENDPOINT_URL}
llm_endpoint_url: ${REMOTE_ENDPOINT}
api_key: ${OPENAI_API_KEY}