Skip to content

Commit c785a2c

Browse files
authored
multiple BLAS backends support (#17)
1 parent d37e921 commit c785a2c

File tree

3 files changed

+27
-2
lines changed

3 files changed

+27
-2
lines changed

.github/workflows/publish-release.yml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,19 @@ on:
1111
jobs:
1212
push_to_dockerhub:
1313
runs-on: ubuntu-latest
14+
strategy:
15+
matrix:
16+
include:
17+
- suffix:
18+
cmake_args: ""
19+
- suffix: -openblas
20+
cmake_args: "-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"
21+
- suffix: -cublas
22+
cmake_args: "-DLLAMA_CUBLAS=on"
23+
- suffix: -clblast
24+
cmake_args: "-DLLAMA_CLBLAST=on"
25+
- suffix: -hipblas
26+
cmake_args: "-DLLAMA_HIPBLAS=on"
1427
steps:
1528
- name: Checkout
1629
uses: actions/checkout@v3
@@ -32,6 +45,8 @@ jobs:
3245
uses: docker/metadata-action@v4
3346
with:
3447
images: 1b5d/llm-api
48+
flavor: |
49+
suffix=${{ matrix.suffix }},onlatest=true
3550
3651
- name: Build and push
3752
uses: docker/build-push-action@v4
@@ -43,6 +58,8 @@ jobs:
4358
labels: ${{ steps.meta.outputs.labels }}
4459
cache-from: type=registry,ref=1b5d/llm-api:latest
4560
cache-to: type=inline
61+
build-args: |
62+
"CMAKE_ARGS=${{ matrix.cmake_args }}"
4663
4764
push_gpu_to_dockerhub:
4865
runs-on: ubuntu-latest

Dockerfile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,8 @@ WORKDIR /llm-api
44

55
COPY ./requirements.txt /llm-api/requirements.txt
66
ENV FORCE_CMAKE "1"
7-
ENV CMAKE_ARGS "-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"
7+
ARG CMAKE_ARGS
8+
ENV CMAKE_ARGS=${CMAKE_ARGS:-""}
89

910
RUN pip install --no-cache-dir --upgrade -r requirements.txt
1011

README.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -186,6 +186,13 @@ model_params:
186186
187187
Ensure to specify the repo_id and filename parameters to point to a Hugging Face repository where the desired model is hosted. The application will then handle the download for you.
188188
189+
Running in this mode can be done using the docker image `1b5d/llm-api:latest`, several images are also available to support different BLAS backends:
190+
- OpenBLAS: `1b5d/llm-api:latest-openblas`
191+
- cuBLAS: `1b5d/llm-api:latest-cublas`
192+
- CLBlast: `1b5d/llm-api:latest-clblast`
193+
- hipBLAS: `1b5d/llm-api:latest-hipblas`
194+
195+
189196
The following example demonstrates the various parameters that can be sent to the Llama generate and agenerate endpoints:
190197
191198
```
@@ -246,7 +253,7 @@ docker compose -f docker-compose.gpu.yaml up
246253
**Important Note**: Before running Llama or Llama 2 on GPU, make sure to install the [NVIDIA Driver](https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html) on your host machine. You can verify the NVIDIA environment by executing the following command:
247254
248255
```
249-
docker run --rm --gpus all nvidia/cuda:11.7.1-base-ubuntu20.04 nvidia-smi
256+
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu20.04 nvidia-smi
250257
```
251258
252259
You should see a table displaying the current NVIDIA driver version and related information, confirming the proper setup.

0 commit comments

Comments
 (0)