AgentQnA: update instructions and env variable names for remote endpoints (opea-project#2113)

alexsin368 · pre-commit-ci[bot] · alexsin368 · commit 40e36823fe71 · 2025-08-13T12:14:32.000-07:00
Signed-off-by: alexsin368 &lt;alex.sin@intel.com&gt;
Co-authored-by: pre-commit-ci[bot] &lt;66853113+pre-commit-ci[bot]@users.noreply.github.com&gt;
Signed-off-by: alexsin368 &lt;alex.sin@intel.com&gt;
diff --git a/AgentQnA/docker_compose/intel/cpu/xeon/README.md b/AgentQnA/docker_compose/intel/cpu/xeon/README.md
@@ -52,11 +52,11 @@ export no_proxy=localhost,127.0.0.1,$host_ip  # additional no proxies if needed
 export NGINX_PORT=${your_nginx_port}          # your usable port for nginx, 80 for example
 ```
 
-#### [Optional] OPENAI_API_KEY to use OpenAI models or Intel® AI for Enterprise Inference
+#### [Optional] OPENAI_API_KEY to use OpenAI models or LLM models with remote endpoints
 
 To use OpenAI models, generate a key following these [instructions](https://platform.openai.com/api-keys).
 
-To use a remote server running Intel® AI for Enterprise Inference, contact the cloud service provider or owner of the on-prem machine for a key to access the desired model on the server.
+When models are deployed on a remote server, a base URL and an API key are required to access them. To set up a remote server and acquire the base URL and API key, refer to [Intel® AI for Enterprise Inference](https://www.intel.com/content/www/us/en/developer/topic-technology/artificial-intelligence/enterprise-inference.html) offerings.
 
 Then set the environment variable `OPENAI_API_KEY` with the key contents:
 
@@ -74,7 +74,7 @@ source $WORKDIR/GenAIExamples/AgentQnA/docker_compose/intel/cpu/xeon/set_env.sh
 
 We make it convenient to launch the whole system with docker compose, which includes microservices for LLM, agents, UI, retrieval tool, vector database, dataprep, and telemetry. There are 3 docker compose files, which make it easy for users to pick and choose. Users can choose a different retrieval tool other than the `DocIndexRetriever` example provided in our GenAIExamples repo. Users can choose not to launch the telemetry containers.
 
-On Xeon, OpenAI models and models deployed on a remote server are supported. Both methods require an API key.
+On Xeon, OpenAI models and models deployed on a remote server are supported. Both methods require an API key where `OPENAI_API_KEY` needs to be set in the [previous step](#optional-openai_api_key-to-use-openai-models-or-llm-models-with-remote-endpoints).
 
 ```bash
 cd $WORKDIR/GenAIExamples/AgentQnA/docker_compose/intel/cpu/xeon
@@ -88,19 +88,25 @@ The command below will launch the multi-agent system with the `DocIndexRetriever
 docker compose -f $WORKDIR/GenAIExamples/DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml -f compose_openai.yaml up -d
 ```
 
-#### Models on Remote Server
+#### Models on Remote Servers
 
 When models are deployed on a remote server with Intel® AI for Enterprise Inference, a base URL and an API key are required to access them. To run the Agent microservice on Xeon while using models deployed on a remote server, add `compose_remote.yaml` to the `docker compose` command and set additional environment variables.
 
-#### Notes
+> **Note**: For AgentQnA, the minimum hardware requirement for the remote server is Intel® Gaudi® AI Accelerators.
 
-- `OPENAI_API_KEY` is already set in a previous step.
-- `model` is used to overwrite the value set for this environment variable in `set_env.sh`.
-- `LLM_ENDPOINT_URL` is the base URL given from the owner of the on-prem machine or cloud service provider. It will follow this format: "https://<DNS>". Here is an example: "https://api.inference.example.com".
+Set the following environment variables.
+
+- `REMOTE_ENDPOINT` is the HTTPS endpoint of the remote server with the model of choice (i.e. https://api.example.com). **Note:** If the API for the models does not use LiteLLM, the second part of the model card needs to be appended to the URL. For example, set `REMOTE_ENDPOINT` to https://api.example.com/Llama-3.3-70B-Instruct if the model card is `meta-llama/Llama-3.3-70B-Instruct`.
+- `model` is the model card which may need to be overwritten depending on what it is set to `set_env.sh`.
+
+```bash
+export REMOTE_ENDPOINT=<https-endpoint-of-remote-server>
+export model=<model-card>
+```
+
+After setting these environment variables, run `docker compose` by adding `compose_remote.yaml` as an additional YAML file:
 
 ```bash
-export model=<name-of-model-card>
-export LLM_ENDPOINT_URL=<http-endpoint-of-remote-server>
 docker compose -f $WORKDIR/GenAIExamples/DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml -f compose_openai.yaml -f compose_remote.yaml up -d
 ```
 
diff --git a/AgentQnA/docker_compose/intel/cpu/xeon/compose_remote.yaml b/AgentQnA/docker_compose/intel/cpu/xeon/compose_remote.yaml
@@ -4,15 +4,15 @@
 services:
   worker-rag-agent:
     environment:
-      llm_endpoint_url: ${LLM_ENDPOINT_URL}
+      llm_endpoint_url: ${REMOTE_ENDPOINT}
       api_key: ${OPENAI_API_KEY}
 
   worker-sql-agent:
     environment:
-      llm_endpoint_url: ${LLM_ENDPOINT_URL}
+      llm_endpoint_url: ${REMOTE_ENDPOINT}
       api_key: ${OPENAI_API_KEY}
 
   supervisor-react-agent:
     environment:
-      llm_endpoint_url: ${LLM_ENDPOINT_URL}
+      llm_endpoint_url: ${REMOTE_ENDPOINT}
       api_key: ${OPENAI_API_KEY}