-
Notifications
You must be signed in to change notification settings - Fork 308
Adding files to deploy FinanceAgent application on ROCm vLLM #1890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
artem-astafev
wants to merge
36
commits into
opea-project:main
from
artem-astafev:feature/FinaceAgent-on-AMD-ROCm-example
Closed
Changes from all commits
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
fd3824d
Add Config for vLLM
artem-astafev ee74383
Update compose_vllm.yaml
artem-astafev 6c50388
Update compose_vllm.yaml
artem-astafev f762e43
Update example config
artem-astafev 277a698
Update set_env_vllm.sh
artem-astafev c5e3ab2
Update set_env_vllm.sh
artem-astafev 14c2fbb
Update set_env_vllm.sh
artem-astafev a4f154f
Update compose_vllm.yaml
artem-astafev 8cf4025
Update compose_vllm.yaml
artem-astafev b38ec3e
Refactor FinanceAgent for rocm
artem-astafev aaf7f86
adjust rocm example
artem-astafev b9c3b45
Update test_compose_on_vllm_rocm.sh
artem-astafev 31371e3
Adjust example config
artem-astafev 5176581
Update compose.yaml
artem-astafev 29447e2
Merge branch 'main' into feature/FinaceAgent-on-AMD-ROCm-example
artem-astafev 490ec13
Add README.md for AMD ROCm deployment
artem-astafev e421ca3
Merge branch 'feature/FinaceAgent-on-AMD-ROCm-example' of https://git…
artem-astafev 6569374
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 77a8e85
Update README.md for AMD ROCm
artem-astafev 4fa8093
Merge branch 'feature/FinaceAgent-on-AMD-ROCm-example' of https://git…
artem-astafev ec64548
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] f4763d9
Merge branch 'main' into feature/FinaceAgent-on-AMD-ROCm-example
artem-astafev d2d4172
Merge branch 'main' into feature/FinaceAgent-on-AMD-ROCm-example
artem-astafev e3de103
Merge branch 'main' into feature/FinaceAgent-on-AMD-ROCm-example
artem-astafev fa8cf00
Merge branch 'main' into feature/FinaceAgent-on-AMD-ROCm-example
artem-astafev 9915173
Merge branch 'main' into feature/FinaceAgent-on-AMD-ROCm-example
artem-astafev de7c65b
Rename tests file for AMD ROCm
artem-astafev 69a7387
Merge branch 'main' into feature/FinaceAgent-on-AMD-ROCm-example
artem-astafev 86c453f
Merge branch 'main' into feature/FinaceAgent-on-AMD-ROCm-example
artem-astafev eafda50
Update launch_vllm.sh
artem-astafev 31be885
Merge branch 'feature/FinaceAgent-on-AMD-ROCm-example' of https://git…
artem-astafev b690eeb
Update launch_vllm.sh
artem-astafev 92d160b
Adjust tests
artem-astafev 9d79858
Update test_compose_vllm_on_rocm.sh
artem-astafev e4ed752
Fix tests
artem-astafev 25a9c63
Merge branch 'main' into feature/FinaceAgent-on-AMD-ROCm-example
chensuyue File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,188 @@ | ||
# Example Finance Agent deployments on AMD GPU (ROCm) | ||
|
||
This document outlines the deployment process for a Finance Agent application utilizing OPEA components on an AMD GPU server. | ||
|
||
This example includes the following sections: | ||
|
||
- [Finance Agent Quick Start Deployment](#finance-agent-quick-start-deployment): Demonstrates how to quickly deploy a Finance Agent application/pipeline on AMD GPU platform. | ||
- [Finance Agent Docker Compose Files](#finance-agent-docker-compose-files): Describes some example deployments and their docker compose files. | ||
- [How to interact with the agent system with UI](#how-to-interact-with-the-agent-system-with-ui): Guideline for UI usage | ||
|
||
## Finance Agent Quick Start Deployment | ||
|
||
This section describes how to quickly deploy and test the Finance Agent service manually on an AMD GPU platform. The basic steps are: | ||
|
||
1. [Access the Code](#access-the-code) | ||
2. [Generate a HuggingFace Access Token](#generate-a-huggingface-access-token) | ||
3. [Deploy the Services Using Docker Compose](#deploy-the-services-using-docker-compose) | ||
4. [Check the Deployment Status](#check-the-deployment-status) | ||
5. [Test the Pipeline](#test-the-pipeline) | ||
6. [Cleanup the Deployment](#cleanup-the-deployment) | ||
|
||
### Access the Code | ||
|
||
Clone the GenAIExample repository and access the ChatQnA AMD GPU platform Docker Compose files and supporting scripts: | ||
|
||
``` | ||
mkdir /path/to/your/workspace/ | ||
export WORKDIR=/path/to/your/workspace/ | ||
cd $WORKDIR | ||
git clone https://github.com/opea-project/GenAIExamples.git | ||
``` | ||
|
||
Checkout a released version, such as v1.4: | ||
|
||
``` | ||
git checkout v1.4 | ||
``` | ||
|
||
### Generate a HuggingFace Access Token | ||
|
||
Some HuggingFace resources, such as some models, are only accessible if you have an access token. If you do not already have a HuggingFace access token, you can create one by first creating an account by following the steps provided at [HuggingFace](https://huggingface.co/) and then generating a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). | ||
|
||
### Deploy the Services Using Docker Compose | ||
|
||
#### 3.1 Launch vllm endpoint | ||
|
||
Below is the command to launch a vllm endpoint on Gaudi that serves `meta-llama/Llama-3.3-70B-Instruct` model on AMD ROCm platform. | ||
|
||
```bash | ||
cd $WORKDIR/GenAIExamples/FinanceAgent/docker_compose/amd/gpu/rocm | ||
bash launch_vllm.sh | ||
``` | ||
|
||
#### 3.2 Prepare knowledge base | ||
|
||
The commands below will upload some example files into the knowledge base. You can also upload files through UI. | ||
|
||
First, launch the redis databases and the dataprep microservice. | ||
|
||
```bash | ||
# inside $WORKDIR/GenAIExamples/FinanceAgent/docker_compose/amd/gpu/rocm | ||
bash launch_dataprep.sh | ||
``` | ||
|
||
Validate datat ingest data and retrieval from database: | ||
|
||
```bash | ||
python $WORKPATH/tests/test_redis_finance.py --port 6007 --test_option ingest | ||
python $WORKPATH/tests/test_redis_finance.py --port 6007 --test_option get | ||
``` | ||
|
||
#### 3.3 Launch the multi-agent system | ||
|
||
The command below will launch 3 agent microservices, 1 docsum microservice, 1 UI microservice. | ||
|
||
```bash | ||
# inside $WORKDIR/GenAIExamples/FinanceAgent/docker_compose/amd/gpu/rocm | ||
bash launch_agents.sh | ||
``` | ||
|
||
#### 3.4 Check the Deployment Status | ||
|
||
After running docker compose, check if all the containers launched via docker compose have started: | ||
|
||
``` | ||
docker ps -a | ||
``` | ||
|
||
For the default deployment, the following 5 containers should have started: | ||
|
||
``` | ||
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES | ||
7e61978c3d75 opea/dataprep:latest "sh -c 'python $( [ …" 31 seconds ago Up 19 seconds 0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp dataprep-redis-server-finance | ||
0fee87aca791 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 3 hours ago Up 3 hours (healthy) 0.0.0.0:6380->6379/tcp, [::]:6380->6379/tcp, 0.0.0.0:8002->8001/tcp, [::]:8002->8001/tcp redis-kv-store | ||
debd549045f8 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 3 hours ago Up 3 hours (healthy) 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db | ||
9cff469364d3 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "/bin/sh -c 'apt-get…" 3 hours ago Up 3 hours (healthy) 0.0.0.0:10221->80/tcp, [::]:10221->80/tcp tei-embedding-serving | ||
13f71e678dbd opea/vllm-rocm:latest "python3 /workspace/…" 3 hours ago Up 3 hours (healthy) 0.0.0.0:8086->8011/tcp, [::]:8086->8011/tcp vllm-service | ||
e5a219a77c95 opea/llm-docsum:latest "bash entrypoint.sh" 3 hours ago Up 2 seconds 0.0.0.0:33218->9000/tcp, [::]:33218->9000/tcp docsum-llm-server | ||
``` | ||
|
||
### 3.5 Validate agents | ||
|
||
FinQA Agent: | ||
|
||
```bash | ||
export agent_port="9095" | ||
prompt="What is Gap's revenue in 2024?" | ||
python3 $WORKDIR/GenAIExamples/FinanceAgent/tests/test.py --prompt "$prompt" --agent_role "worker" --ext_port $agent_port | ||
``` | ||
|
||
Research Agent: | ||
|
||
```bash | ||
export agent_port="9096" | ||
prompt="generate NVDA financial research report" | ||
python3 $WORKDIR/GenAIExamples/FinanceAgent/tests/test.py --prompt "$prompt" --agent_role "worker" --ext_port $agent_port --tool_choice "get_current_date" --tool_choice "get_share_performance" | ||
``` | ||
|
||
Supervisor Agent single turns: | ||
|
||
```bash | ||
export agent_port="9090" | ||
python3 $WORKDIR/GenAIExamples/FinanceAgent/tests/test.py --agent_role "supervisor" --ext_port $agent_port --stream | ||
``` | ||
|
||
Supervisor Agent multi turn: | ||
|
||
```bash | ||
python3 $WORKDIR/GenAIExamples/FinanceAgent/tests/test.py --agent_role "supervisor" --ext_port $agent_port --multi-turn --stream | ||
|
||
``` | ||
|
||
### Cleanup the Deployment | ||
|
||
To stop the containers associated with the deployment, execute the following commands: | ||
|
||
``` | ||
docker compose -f compose.yaml down | ||
docker compose -f compose_vllm.yaml down | ||
docker compose -f dataprep_compose.yaml down | ||
``` | ||
|
||
All the Finance Agent containers will be stopped and then removed on completion of the "down" command. | ||
|
||
## Finance Agent Docker Compose Files | ||
|
||
In the context of deploying a Finance Agent pipeline on an AMD GPU platform, we can pick and choose different large language model serving frameworks. The table below outlines the various configurations that are available as part of the application. | ||
|
||
| File | Description | | ||
| ------------------------------------------------ | ------------------------------------------------------------------------------------- | | ||
| [compose.yaml](./compose.yaml) | Default compose to run agent service | | ||
| [compose_vllm.yaml](./compose_vllm.yaml) | The LLM Service serving framework is vLLM. | | ||
| [dataprep_compose.yaml](./dataprep_compose.yaml) | Compose file to run Data Prep service such as Redis vector DB, Re-rancer and Embedder | | ||
|
||
## How to interact with the agent system with UI | ||
|
||
The UI microservice is launched in the previous step with the other microservices. | ||
To see the UI, open a web browser to `http://${ip_address}:5175` to access the UI. Note the `ip_address` here is the host IP of the UI microservice. | ||
|
||
1. Create Admin Account with a random value | ||
|
||
2. Enter the endpoints in the `Connections` settings | ||
|
||
First, click on the user icon in the upper right corner to open `Settings`. Click on `Admin Settings`. Click on `Connections`. | ||
|
||
Then, enter the supervisor agent endpoint in the `OpenAI API` section: `http://${ip_address}:9090/v1`. Enter the API key as "empty". Add an arbitrary model id in `Model IDs`, for example, "opea_agent". The `ip_address` here should be the host ip of the agent microservice. | ||
|
||
Then, enter the dataprep endpoint in the `Icloud File API` section. You first need to enable `Icloud File API` by clicking on the button on the right to turn it into green and then enter the endpoint url, for example, `http://${ip_address}:6007/v1`. The `ip_address` here should be the host ip of the dataprep microservice. | ||
|
||
You should see screen like the screenshot below when the settings are done. | ||
|
||
 | ||
|
||
3. Upload documents with UI | ||
|
||
Click on the `Workplace` icon in the top left corner. Click `Knowledge`. Click on the "+" sign to the right of `Icloud Knowledge`. You can paste an url in the left hand side of the pop-up window, or upload a local file by click on the cloud icon on the right hand side of the pop-up window. Then click on the `Upload Confirm` button. Wait till the processing is done and the pop-up window will be closed on its own when the data ingestion is done. See the screenshot below. | ||
|
||
Note: the data ingestion may take a few minutes depending on the length of the document. Please wait patiently and do not close the pop-up window. | ||
|
||
 | ||
|
||
4. Test agent with UI | ||
|
||
After the settings are done and documents are ingested, you can start to ask questions to the agent. Click on the `New Chat` icon in the top left corner, and type in your questions in the text box in the middle of the UI. | ||
|
||
The UI will stream the agent's response tokens. You need to expand the `Thinking` tab to see the agent's reasoning process. After the agent made tool calls, you would also see the tool output after the tool returns output to the agent. Note: it may take a while to get the tool output back if the tool execution takes time. | ||
|
||
 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,132 @@ | ||
# Copyright (C) 2025 Advanced Micro Devices, Inc. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
services: | ||
worker-finqa-agent: | ||
image: opea/agent:latest | ||
container_name: finqa-agent-endpoint | ||
volumes: | ||
- ${TOOLSET_PATH}:/home/user/tools/ | ||
- ${PROMPT_PATH}:/home/user/prompts/ | ||
ports: | ||
- "9095:9095" | ||
ipc: host | ||
environment: | ||
ip_address: ${ip_address} | ||
strategy: react_llama | ||
with_memory: false | ||
recursion_limit: ${recursion_limit_worker} | ||
llm_engine: vllm | ||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} | ||
llm_endpoint_url: ${LLM_ENDPOINT_URL} | ||
model: ${LLM_MODEL_ID} | ||
temperature: ${TEMPERATURE} | ||
max_new_tokens: ${MAX_TOKENS} | ||
stream: false | ||
tools: /home/user/tools/finqa_agent_tools.yaml | ||
custom_prompt: /home/user/prompts/finqa_prompt.py | ||
require_human_feedback: false | ||
no_proxy: ${no_proxy} | ||
http_proxy: ${http_proxy} | ||
https_proxy: ${https_proxy} | ||
REDIS_URL_VECTOR: $REDIS_URL_VECTOR | ||
REDIS_URL_KV: $REDIS_URL_KV | ||
TEI_EMBEDDING_ENDPOINT: $TEI_EMBEDDING_ENDPOINT | ||
port: 9095 | ||
|
||
worker-research-agent: | ||
image: opea/agent:latest | ||
container_name: research-agent-endpoint | ||
volumes: | ||
- ${TOOLSET_PATH}:/home/user/tools/ | ||
- ${PROMPT_PATH}:/home/user/prompts/ | ||
ports: | ||
- "9096:9096" | ||
ipc: host | ||
environment: | ||
ip_address: ${ip_address} | ||
strategy: react_llama | ||
with_memory: false | ||
recursion_limit: 25 | ||
llm_engine: vllm | ||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} | ||
llm_endpoint_url: ${LLM_ENDPOINT_URL} | ||
model: ${LLM_MODEL_ID} | ||
stream: false | ||
tools: /home/user/tools/research_agent_tools.yaml | ||
custom_prompt: /home/user/prompts/research_prompt.py | ||
require_human_feedback: false | ||
no_proxy: ${no_proxy} | ||
http_proxy: ${http_proxy} | ||
https_proxy: ${https_proxy} | ||
FINNHUB_API_KEY: ${FINNHUB_API_KEY} | ||
FINANCIAL_DATASETS_API_KEY: ${FINANCIAL_DATASETS_API_KEY} | ||
port: 9096 | ||
|
||
supervisor-react-agent: | ||
image: opea/agent:latest | ||
container_name: supervisor-agent-endpoint | ||
depends_on: | ||
- worker-finqa-agent | ||
- worker-research-agent | ||
volumes: | ||
- ${TOOLSET_PATH}:/home/user/tools/ | ||
- ${PROMPT_PATH}:/home/user/prompts/ | ||
ports: | ||
- "9090:9090" | ||
ipc: host | ||
environment: | ||
ip_address: ${ip_address} | ||
strategy: react_llama | ||
with_memory: true | ||
recursion_limit: ${recursion_limit_supervisor} | ||
llm_engine: vllm | ||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} | ||
llm_endpoint_url: ${LLM_ENDPOINT_URL} | ||
model: ${LLM_MODEL_ID} | ||
temperature: ${TEMPERATURE} | ||
max_new_tokens: ${MAX_TOKENS} | ||
stream: true | ||
tools: /home/user/tools/supervisor_agent_tools.yaml | ||
custom_prompt: /home/user/prompts/supervisor_prompt.py | ||
require_human_feedback: false | ||
no_proxy: ${no_proxy} | ||
http_proxy: ${http_proxy} | ||
https_proxy: ${https_proxy} | ||
WORKER_FINQA_AGENT_URL: $WORKER_FINQA_AGENT_URL | ||
WORKER_RESEARCH_AGENT_URL: $WORKER_RESEARCH_AGENT_URL | ||
DOCSUM_ENDPOINT: $DOCSUM_ENDPOINT | ||
REDIS_URL_VECTOR: $REDIS_URL_VECTOR | ||
REDIS_URL_KV: $REDIS_URL_KV | ||
TEI_EMBEDDING_ENDPOINT: $TEI_EMBEDDING_ENDPOINT | ||
port: 9090 | ||
docsum-llm-textgen: | ||
image: ${REGISTRY:-opea}/llm-docsum:${TAG:-latest} | ||
container_name: docsum-llm-server | ||
ports: | ||
- "${DOCSUM_LLM_SERVER_PORT}:9000" | ||
ipc: host | ||
environment: | ||
no_proxy: ${no_proxy} | ||
http_proxy: ${http_proxy} | ||
https_proxy: ${https_proxy} | ||
LLM_ENDPOINT: ${LLM_ENDPOINT} | ||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} | ||
MAX_INPUT_TOKENS: ${MAX_INPUT_TOKENS} | ||
MAX_TOTAL_TOKENS: ${MAX_TOTAL_TOKENS} | ||
LLM_MODEL_ID: ${LLM_MODEL_ID} | ||
DocSum_COMPONENT_NAME: "DocSum_COMPONENT_NAME:-OpeaDocSumvLLM" | ||
LOGFLAG: ${LOGFLAG:-False} | ||
restart: unless-stopped | ||
|
||
agent-ui: | ||
image: opea/agent-ui:latest | ||
container_name: agent-ui | ||
environment: | ||
host_ip: ${host_ip} | ||
no_proxy: ${no_proxy} | ||
http_proxy: ${http_proxy} | ||
https_proxy: ${https_proxy} | ||
ports: | ||
- "5175:8080" | ||
ipc: host |
39 changes: 39 additions & 0 deletions
39
FinanceAgent/docker_compose/amd/gpu/rocm/compose_vllm.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# Copyright (C) 2025 Advanced Micro Devices, Inc. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
services: | ||
vllm-service: | ||
image: ${REGISTRY:-opea}/vllm-rocm:${TAG:-latest} | ||
container_name: vllm-service | ||
ports: | ||
- "${FINANCEAGENT_VLLM_SERVICE_PORT:-8081}:8011" | ||
environment: | ||
no_proxy: ${no_proxy} | ||
http_proxy: ${http_proxy} | ||
https_proxy: ${https_proxy} | ||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} | ||
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} | ||
HF_HUB_DISABLE_PROGRESS_BARS: 1 | ||
HF_HUB_ENABLE_HF_TRANSFER: 0 | ||
VLLM_USE_TRITON_FLASH_ATTENTION: 0 | ||
PYTORCH_JIT: 0 | ||
healthcheck: | ||
test: [ "CMD-SHELL", "curl -f http://${HOST_IP}:${FINANCEAGENT_VLLM_SERVICE_PORT:-8081}/health || exit 1" ] | ||
interval: 10s | ||
timeout: 10s | ||
retries: 100 | ||
volumes: | ||
- "${MODEL_CACHE:-./data}:/data" | ||
shm_size: 20G | ||
devices: | ||
- /dev/kfd:/dev/kfd | ||
- /dev/dri/:/dev/dri/ | ||
cap_add: | ||
- SYS_PTRACE | ||
group_add: | ||
- video | ||
security_opt: | ||
- seccomp:unconfined | ||
- apparmor=unconfined | ||
command: "--model ${LLM_MODEL_ID} --swap-space 16 --disable-log-requests --dtype float16 --tensor-parallel-size 4 --host 0.0.0.0 --port 8011 --num-scheduler-steps 1 --distributed-executor-backend \"mp\"" | ||
ipc: host |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference cannot be redirected
3. [Deploy the Services Using Docker Compose](#deploy-the-services-using-docker-compose) 4. [Check the Deployment Status](#check-the-deployment-status)