Skip to content

Commit 5c7adc4

Browse files
【Inference】Fix qwen2.5-vl and deepseek-vl2 high-performance inference (#1175)
Co-authored-by: nifeng <nemonameless@qq.com>
1 parent 75ac35e commit 5c7adc4

File tree

8 files changed

+1075
-85
lines changed

8 files changed

+1075
-85
lines changed

deploy/deepseek_vl2/README.md

Lines changed: 37 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -6,31 +6,29 @@
66
| Model |
77
|---------------------------------|
88
| deepseek-ai/deepseek-vl2-small |
9-
| deepseek-ai/deepseek-vl2 |
109

1110
## 环境安装
1211
[安装PaddlePaddle](https://github.com/PaddlePaddle/PaddleMIX?tab=readme-ov-file#3-%EF%B8%8F%E5%AE%89%E8%A3%85paddlepaddle)
1312
- **python >= 3.10**
1413
- **paddlepaddle-gpu 要求develop版本**
1514
```bash
16-
# Develop 版本安装示例,请确保使用的Paddle版本为develop版本
15+
# Develop 版本安装示例
1716
python -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu123/
1817
```
1918

2019
2) [安装PaddleMIX环境依赖包](https://github.com/PaddlePaddle/PaddleMIX?tab=readme-ov-file#3-%EF%B8%8F%E5%AE%89%E8%A3%85paddlepaddle)
2120
```bash
2221
# pip 安装示例,安装paddlemix、ppdiffusers、项目依赖
23-
python -m pip install -e .
22+
python -m pip install -e .
2423
python -m pip install -e ppdiffusers
2524
python -m pip install -r requirements.txt
2625

2726
# 安装PaddleNLP
28-
pip uninstall paddlenlp && rm -rf PaddleNLP
29-
git clone https://github.com/PaddlePaddle/PaddleNLP.git
27+
pip uninstall -y paddlenlp && rm -rf PaddleNLP
28+
git clone -b release/3.0-beta4-new --depth=1 https://github.com/PaddlePaddle/PaddleNLP.git
3029
cd PaddleNLP
3130
pip install -e .
32-
cd csrc
33-
python setup_cuda.py install
31+
pip install https://paddlenlp.bj.bcebos.com/ops/cu118/paddlenlp_ops-3.0.0b4-py3-none-any.whl
3432
```
3533

3634
## 3 高性能推理
@@ -39,56 +37,72 @@ python setup_cuda.py install
3937

4038
```
4139
export CUDA_VISIBLE_DEVICES=0
42-
export FLAGS_cascade_attention_max_partition_size=163840
43-
export FLAGS_mla_use_tensorcore=1
40+
export FLAGS_mla_use_tensorcore=0
41+
export FLAGS_cascade_attention_max_partition_size=128
42+
export FLAGS_cascade_attention_deal_each_time=16
4443
python deploy/deepseek_vl2/deepseek_vl2_infer.py \
4544
--model_name_or_path deepseek-ai/deepseek-vl2-small \
4645
--question "Describe this image." \
4746
--image_file paddlemix/demo_images/examples_image1.jpg \
4847
--min_length 128 \
4948
--max_length 128 \
49+
--inference_model True \
50+
--append_attn True \
51+
--mode dynamic \
52+
--dtype bfloat16 \
5053
--top_k 1 \
5154
--top_p 0.001 \
5255
--temperature 0.1 \
5356
--repetition_penalty 1.05 \
54-
--block_attn True \
57+
--benchmark
58+
59+
# 多图推理
60+
python deploy/deepseek_vl2/deepseek_vl2_infer_multi_image.py \
61+
--model_name_or_path deepseek-ai/deepseek-vl2-small \
62+
--question "What are in these images." \
63+
--image_file_1 paddlemix/demo_images/examples_image1.jpg \
64+
--image_file_2 paddlemix/demo_images/examples_image2.jpg \
65+
--image_file_3 paddlemix/demo_images/examples_image1.jpg \
66+
--min_length 128 \
67+
--max_length 128 \
5568
--inference_model True \
5669
--append_attn True \
5770
--mode dynamic \
5871
--dtype bfloat16 \
59-
--mla_use_matrix_absorption
72+
--top_k 1 \
73+
--top_p 0.001 \
74+
--temperature 0.1 \
75+
--repetition_penalty 1.05 \
76+
--benchmark
6077
```
6178

6279
### b. wint8 高性能推理
6380
```
6481
export CUDA_VISIBLE_DEVICES=0
65-
export FLAGS_cascade_attention_max_partition_size=163840
66-
export FLAGS_mla_use_tensorcore=1
82+
export FLAGS_mla_use_tensorcore=0
83+
export FLAGS_cascade_attention_max_partition_size=128
84+
export FLAGS_cascade_attention_deal_each_time=16
6785
python deploy/deepseek_vl2/deepseek_vl2_infer.py \
6886
--model_name_or_path deepseek-ai/deepseek-vl2-small \
6987
--question "Describe this image." \
7088
--image_file paddlemix/demo_images/examples_image1.jpg \
7189
--min_length 128 \
7290
--max_length 128 \
73-
--top_k 1 \
74-
--top_p 0.001 \
75-
--temperature 0.1 \
76-
--repetition_penalty 1.05 \
77-
--block_attn True \
7891
--inference_model True \
7992
--append_attn True \
8093
--mode dynamic \
8194
--dtype bfloat16 \
82-
--mla_use_matrix_absorption \
83-
--quant_type "weight_only_int8"
95+
--top_k 1 \
96+
--top_p 0.001 \
97+
--temperature 0.1 \
98+
--repetition_penalty 1.05 \
99+
--quant_type "weight_only_int8" \
100+
--benchmark True
84101
```
85102

86103
## 4 一键推理 & 推理说明
87-
进入PaddleMIX目录运行
88-
```bash
89104
cd PaddleMIX
90105
sh deploy/deepseek_vl2/shell/run.sh
91-
```
92106
#### 参数设定
93107
| parameter | Value |
94108
| ------------------ | -------------- |
@@ -102,10 +116,3 @@ sh deploy/deepseek_vl2/shell/run.sh
102116
| ------------------ | -------------- |
103117
| min_length | 128 |
104118
| min_length | 128 |
105-
106-
以下为单张图片的测速情况
107-
108-
| model | Paddle高性能推理 | Paddle |
109-
| ------------------------------ | ---------------------| ------------- |
110-
| deepseek-ai/deepseek-vl2-small | 9.3 s | 12.8 s |
111-
| deepseek-ai/deepseek-vl2 | - | 17.2 s |

0 commit comments

Comments
 (0)