diff --git a/AgentQnA/docker_compose/intel/cpu/xeon/README.md b/AgentQnA/docker_compose/intel/cpu/xeon/README.md index f6af6b9757..e327cb43ee 100644 --- a/AgentQnA/docker_compose/intel/cpu/xeon/README.md +++ b/AgentQnA/docker_compose/intel/cpu/xeon/README.md @@ -52,11 +52,11 @@ export no_proxy=localhost,127.0.0.1,$host_ip # additional no proxies if needed export NGINX_PORT=${your_nginx_port} # your usable port for nginx, 80 for example ``` -#### [Optional] OPENAI_API_KEY to use OpenAI models or Intel® AI for Enterprise Inference +#### [Optional] OPENAI_API_KEY to use OpenAI models or LLM models with remote endpoints To use OpenAI models, generate a key following these [instructions](https://platform.openai.com/api-keys). -To use a remote server running Intel® AI for Enterprise Inference, contact the cloud service provider or owner of the on-prem machine for a key to access the desired model on the server. +When models are deployed on a remote server, a base URL and an API key are required to access them. To set up a remote server and acquire the base URL and API key, refer to [Intel® AI for Enterprise Inference](https://www.intel.com/content/www/us/en/developer/topic-technology/artificial-intelligence/enterprise-inference.html) offerings. Then set the environment variable `OPENAI_API_KEY` with the key contents: @@ -74,7 +74,7 @@ source $WORKDIR/GenAIExamples/AgentQnA/docker_compose/intel/cpu/xeon/set_env.sh We make it convenient to launch the whole system with docker compose, which includes microservices for LLM, agents, UI, retrieval tool, vector database, dataprep, and telemetry. There are 3 docker compose files, which make it easy for users to pick and choose. Users can choose a different retrieval tool other than the `DocIndexRetriever` example provided in our GenAIExamples repo. Users can choose not to launch the telemetry containers. -On Xeon, OpenAI models and models deployed on a remote server are supported. Both methods require an API key. +On Xeon, OpenAI models and models deployed on a remote server are supported. Both methods require an API key where `OPENAI_API_KEY` needs to be set in the [previous step](#optional-openai_api_key-to-use-openai-models-or-llm-models-with-remote-endpoints). ```bash cd $WORKDIR/GenAIExamples/AgentQnA/docker_compose/intel/cpu/xeon @@ -88,19 +88,25 @@ The command below will launch the multi-agent system with the `DocIndexRetriever docker compose -f $WORKDIR/GenAIExamples/DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml -f compose_openai.yaml up -d ``` -#### Models on Remote Server +#### Models on Remote Servers When models are deployed on a remote server with Intel® AI for Enterprise Inference, a base URL and an API key are required to access them. To run the Agent microservice on Xeon while using models deployed on a remote server, add `compose_remote.yaml` to the `docker compose` command and set additional environment variables. -#### Notes +> **Note**: For AgentQnA, the minimum hardware requirement for the remote server is Intel® Gaudi® AI Accelerators. -- `OPENAI_API_KEY` is already set in a previous step. -- `model` is used to overwrite the value set for this environment variable in `set_env.sh`. -- `LLM_ENDPOINT_URL` is the base URL given from the owner of the on-prem machine or cloud service provider. It will follow this format: "https://". Here is an example: "https://api.inference.example.com". +Set the following environment variables. + +- `REMOTE_ENDPOINT` is the HTTPS endpoint of the remote server with the model of choice (i.e. https://api.example.com). **Note:** If the API for the models does not use LiteLLM, the second part of the model card needs to be appended to the URL. For example, set `REMOTE_ENDPOINT` to https://api.example.com/Llama-3.3-70B-Instruct if the model card is `meta-llama/Llama-3.3-70B-Instruct`. +- `model` is the model card which may need to be overwritten depending on what it is set to `set_env.sh`. + +```bash +export REMOTE_ENDPOINT= +export model= +``` + +After setting these environment variables, run `docker compose` by adding `compose_remote.yaml` as an additional YAML file: ```bash -export model= -export LLM_ENDPOINT_URL= docker compose -f $WORKDIR/GenAIExamples/DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml -f compose_openai.yaml -f compose_remote.yaml up -d ``` diff --git a/AgentQnA/docker_compose/intel/cpu/xeon/compose_remote.yaml b/AgentQnA/docker_compose/intel/cpu/xeon/compose_remote.yaml index 24536435a3..50da3cf736 100644 --- a/AgentQnA/docker_compose/intel/cpu/xeon/compose_remote.yaml +++ b/AgentQnA/docker_compose/intel/cpu/xeon/compose_remote.yaml @@ -4,15 +4,15 @@ services: worker-rag-agent: environment: - llm_endpoint_url: ${LLM_ENDPOINT_URL} + llm_endpoint_url: ${REMOTE_ENDPOINT} api_key: ${OPENAI_API_KEY} worker-sql-agent: environment: - llm_endpoint_url: ${LLM_ENDPOINT_URL} + llm_endpoint_url: ${REMOTE_ENDPOINT} api_key: ${OPENAI_API_KEY} supervisor-react-agent: environment: - llm_endpoint_url: ${LLM_ENDPOINT_URL} + llm_endpoint_url: ${REMOTE_ENDPOINT} api_key: ${OPENAI_API_KEY}