refine code

1649759610 · 1649759610 · commit eed7176aa756 · 2023-09-15T14:10:44.000Z
diff --git a/paddlemix/examples/minigpt4/deploy/README.md b/paddlemix/examples/minigpt4/deploy/README.md
@@ -0,0 +1,120 @@
+# MiniGPT4 推理加速
+
+本项目提供了基于 MiniGPT4 的推理加速功能，基本的解决思路是将 MiniGPT4 动态图转为静态图，然后基于 PaddleInference 库进行推理加速。
+
+下图展示了 MiniGPT4 的整体模型结构， 可以看到整体上，MiniGPT4的主要部分由 VIT， QFormer 和 Vicuna 模型组成，其中 Vicuna 模型是基于 Llama 训练的，在代码实现中调用的也是Llama代码，为方便描述，忽略不必要的分歧，所以在后续中将语言模型这部分默认描述为Llama。
+
+在本方案中，我们将MiniGPT4 导出为两个子图：VIT 和 QFormer部分导出为一个静态子图， Llama 部分导出为一个子图。后续会结合这两个子图统一做 MiniGPT4 的推理功能。
+
+<center><img src="https://github.com/PaddlePaddle/Paddle/assets/35913314/f0306cb6-4837-4f52-8f57-a0e7e35238f6" /></center>
+
+
+
+
+## 1. 环境准备
+### 1.1 基础环境准备：
+本项目在以下基础环境进行了验证：
+- CUDA: 11.7
+- python: 3.11
+- paddle: develop版
+
+其中CUDA版本需要>=11.2， 具体Paddle版本可以点击[这里](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html)按需下载。
+
+
+### 1.2 安装项目库
+1. 本项目需要用到 PaddleMIX 和 PaddleNLP 两个库，并且需要下载最新的 develop 版本：
+
+```shell
+git clone https://github.com/PaddlePaddle/PaddleNLP.git
+git clone https://github.com/PaddlePaddle/PaddleMIX.git
+```
+
+2. 安装paddlenlp_ops：
+```shell
+cd PaddleNLP/csrc
+python setup_cuda.py install
+```
+
+3. 最后设置相应的环境变量：
+```shell
+export PYTHONPATH= yourpath/PaddleNLP:yourpath/PaddleMIX
+```
+
+### 1.3 特别说明
+目前需要修复PaddleNLP和Paddle的部分代码，从而进行MiniGPT4推理加速。这部分功能后续逐步会逐步完善到PaddleNLP和Paddle，但目前如果想使用的话需要手动修改一下。
+1. 修改PaddleNLP代码: 
+参考该[分支代码](https://github.com/1649759610/PaddleNLP/tree/bugfix_minigpt4)，依次替换以下文件：
+- PaddleNLP/paddlenlp/experimental/transformers/generation_utils.py
+- PaddleNLP/paddlenlp/experimental/transformers/llama/modeling.py
+- PaddleNLP/llm/export_model.py
+
+2. 修改Paddle代码
+进入到Paddle安装目录，打开文件：paddle/static/io.py, 注释第284-287行代码：
+```python
+if not skip_prune_program:
+    copy_program = copy_program._prune_with_input(
+        feeded_var_names=feed_var_names, targets=fetch_vars
+    )
+```
+
+## 2. MiniGPT4 分阶段导出
+
+### 2.1 导出前一部分子图：
+请确保在该目录下：PaddleMIX/paddlemix/examples/minigpt4/inference，按照以下命令进行导出：
+```
+python export_image_encoder.py \
+    --minigpt4_13b_path "you minigpt4 dir path" \
+    --save_path "./checkpoints/encode_image/encode_image" 
+```
+
+**参数说明**:
+- minigpt4_13b_path: 存放MiniGPT4的目录名
+- save_path: 前一部分模型的导出路径和名称
+
+
+### 2.2 导出后一部分子图
+请进入到目录： PaddleNLP/llm, 按照以下命令进行导出：
+```
+python export_model.py \
+    --model_name_or_path "your llama dir path" \
+    --output_path "your output path" \
+    --dtype float16 \
+    --inference_model \
+    --model_prefix llama \
+    --model_type llama-img2txt
+    
+```
+
+**参数说明**:
+- model_name_or_path: 存放Llama模型的目录名
+- output_path: 语言模型部分的导出路径和名称
+- dtype: 模型权重数据类型
+- inference_model: 表示是推理模型
+- model_prefix: 指明模型前缀
+- model_type: 指明模型类型
+
+**备注**： 当前导出Llama部分需要转移到PaddleNLP下进行手动导出，后续将支持在PaddleMIX下一键转出。
+
+## 3. MiniGPT4 静态图推理
+请进入到目录PaddleMIX/paddlemix/examples/minigpt4/inference，执行以下命令：
+```python
+python run_static_predict.py \
+    --first_model_path "The dir name of image encoder model" \
+    --second_model_path "The dir name of language model" \
+    --minigpt4_path "The minigpt4 dir name of saving tokenizer"
+```
+
+**参数说明**:
+- first_model_path: 存放前一部分（即vit和qformer）的静态图模型目录名
+- second_model_path: 存放后一部分（即语言模型）的静态图模型目录名
+- minigpt4_path: 存放 MiniGPT4 tokenizer的目录名
+
+以下展示了针对以下这个图片，MiniGPT4静态图推理的输出：
+
+<center><img src="https://paddlenlp.bj.bcebos.com/data/images/mugs.png" /></center>
+
+```text
+Reference: The image shows two black and white cats sitting next to each other on a blue background. The cats have black fur and white fur with black noses, eyes, and paws. They are both looking at the camera with a curious expression. The mugs are also blue with the same design of the cats on them. There is a small white flower on the left side of the mug. The background is a light blue color.
+
+Outputs:  ['The image shows two black and white cats sitting next to each other on a blue background. The cats have black fur and white fur with black noses, eyes, and paws. They are both looking at the camera with a curious expression. The mugs are also blue with the same design of the cats on them. There is a small white flower on the left side of the mug. The background is a light blue color.##']
+```
diff --git a/paddlemix/examples/minigpt4/deploy/export_image_encoder.py b/paddlemix/examples/minigpt4/deploy/export_image_encoder.py
@@ -1,6 +1,6 @@
 import argparse
 import os
-os.environ["CUDA_VISIBLE_DEVICES"]="7"
+os.environ["CUDA_VISIBLE_DEVICES"]="0"
 os.environ["FLAGS_use_cuda_managed_memory"]="true"
 
 import paddle
@@ -41,26 +41,3 @@ def export(args):
     args = parser.parse_args()
 
     export(args)
-
-
-
-
-
-
-
-
-# processor = MiniGPT4Processor.from_pretrained(minigpt4_13b_path)
-# print("load processor and model done!")
-
-# # prepare model inputs for MiniGPT4
-# url = "https://paddlenlp.bj.bcebos.com/data/images/mugs.png"
-# image = Image.open(requests.get(url, stream=True).raw)
-
-# inputs = processor.process_images(image)
-# model.
-
-
-# # generate with MiniGPT4
-# outputs = model.generate(**inputs, **generate_kwargs)
-# msg = processor.batch_decode(outputs[0])
-# print(msg)
diff --git a/paddlemix/examples/minigpt4/deploy/run_static_predict.py b/paddlemix/examples/minigpt4/deploy/run_static_predict.py
@@ -1,21 +1,19 @@
 import argparse
 import os
-os.environ["CUDA_VISIBLE_DEVICES"] = "7"
+import sys
+import requests
+import numpy as np
+import datetime
+os.environ["CUDA_VISIBLE_DEVICES"] = "0"
 os.environ["FLAGS_use_cuda_managed_memory"] = "true"
 
-
 import paddle
 from paddle import inference
 from paddlenlp.transformers import MiniGPT4Processor
 from PIL import Image
-import requests
 
-import sys
-
-# sys.path.append("/wangqinghui/PaddleNLP/llm")
 from utils import load_real_time_tokens
 
-import numpy as np
 
 class Predictor(object):
     def __init__(self, args):
@@ -62,7 +60,6 @@ def create_predictor(self, model_path):
             # such as initialize the gpu memory, enable tensorrt
             config.enable_use_gpu(100, 0)
             precision_mode = inference.PrecisionType.Half
-            # breakpoint()
             # 第一个模型是要跑TRT的
             if self.args.use_tensorrt:
                 config.enable_tuned_tensorrt_dynamic_shape(shape_range_file, True)
@@ -74,7 +71,6 @@ def create_predictor(self, model_path):
         predictor = paddle.inference.create_predictor(config)
         input_handles = [predictor.get_input_handle(name) for name in predictor.get_input_names()]
         output_handle = [predictor.get_output_handle(name) for name in predictor.get_output_names()]
-        # output_handle = predictor.get_output_handle(predictor.get_output_names()[0])
 
         return predictor, input_handles, output_handle
 
@@ -93,9 +89,6 @@ def generate_with_image_features(self,
                                      first_attention_mask=None,
                                      second_attention_mask=None,
                                      **generate_kwargs, ):
-        # print("image_attention_mask", image_attention_mask)
-        # print("first_attention_mask", first_attention_mask)
-        # print("second_attention_mask", second_attention_mask)
         batch, seq,_ = image_features.shape
         seq = image_features.shape[1] + first_input_ids.shape[1] + second_input_ids.shape[1]
         max_len = 204
@@ -200,32 +193,26 @@ def predict(self, images, text, prompt=None):
     predictor = Predictor(args)
 
     url = "https://paddlenlp.bj.bcebos.com/data/images/mugs.png"
-    #url = "https://paddlenlp.bj.bcebos.com/data/images/female.png"
     image = Image.open(requests.get(url, stream=True).raw)
 
     text = "describe this image"
     prompt = "Give the following image: <Img>ImageContent</Img>. You will be able to see the image once I provide it to you. Please answer my questions.###Human: <Img><ImageHere></Img> <TextHere>###Assistant:"
 
-    # warp up
-    warm_up_times = 1
-    repeat_times = 5
+    # warm up
+    warm_up_times = 2
+    repeat_times = 10
     for i in range(warm_up_times):
         msg = predictor.predict(image, text, prompt)
 
-    
     # 测试50次
-    import datetime
     starttime = datetime.datetime.now()
-
     for i in range(repeat_times):
         msg = predictor.predict(image, text, prompt)
     
     endtime = datetime.datetime.now()
     duringtime = endtime - starttime
     time_ms = duringtime.seconds * 1000 + duringtime.microseconds / 1000.0
 
-    print(
-        "Reference: The image shows two black and white cats sitting next to each other on a blue background. The cats have black fur and white fur with black noses, eyes, and paws. They are both looking at the camera with a curious expression. The mugs are also blue with the same design of the cats on them. There is a small white flower on the left side of the mug. The background is a light blue color.")
+    print("Reference: The image shows two black and white cats sitting next to each other on a blue background. The cats have black fur and white fur with black noses, eyes, and paws. They are both looking at the camera with a curious expression. The mugs are also blue with the same design of the cats on them. There is a small white flower on the left side of the mug. The background is a light blue color.")
     print("Outputs: ", msg)
-    print("infer OK")
-    print("The whoel end to end time : ", time_ms / repeat_times, "ms")
+    print("The whole time on average: ", time_ms / repeat_times, "ms")
diff --git a/paddlemix/examples/minigpt4/deploy/utils.py b/paddlemix/examples/minigpt4/deploy/utils.py
@@ -0,0 +1,65 @@
+# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import annotations
+
+import glob
+import os
+import struct
+import numpy as np
+
+
+def deserialize_from_file(fp):
+    x_type = fp.read(1)
+    x_type_out = struct.unpack("c", x_type)[0]
+    # data
+    data_list = []
+    if x_type_out == b"0":
+        data = fp.read(4)
+        data_out = struct.unpack("f", data)[0]
+        while data:
+            data_out = struct.unpack("f", data)[0]
+            data_list.append(data_out)
+            data = fp.read(4)
+    elif x_type_out == b"1":
+        data = fp.read(8)
+        while data:
+            data_out = struct.unpack("l", data)[0]
+            data_list.append(data_out)
+            data = fp.read(8)
+    elif x_type_out == b"2":
+        data = fp.read(4)
+        while data:
+            data_out = struct.unpack("i", data)[0]
+            data_list.append(data_out)
+            data = fp.read(4)
+    else:
+        print("type error")
+    data_arr = np.array(data_list)
+    return data_arr
+
+def load_real_time_tokens():
+    tokens = []
+    files = glob.glob(os.path.join("./real_time_save.*"))
+    for j in range(1, len(files) + 1):
+        filename = "./real_time_save.temp_ids_rank_0_step_{}".format(j)
+        if not os.path.exists(filename):
+            break
+        fp = open(filename, "rb+")
+        fp.read(1)
+        data_list = deserialize_from_file(fp)
+        fp.close()
+        tokens.append(np.array(data_list).reshape(-1, 1))
+    os.system("rm -f ./real_time_save.temp_ids_rank_*")
+    tokens = np.concatenate(tokens, axis=1)
+    return tokens
diff --git a/paddlemix/examples/minigpt4/inference/README.md b/paddlemix/examples/minigpt4/inference/README.md
@@ -37,7 +37,7 @@ python setup_cuda.py install
 
 3. 最后设置相应的环境变量：
 ```shell
-export PYTHONPATH=/wangqinghui/PaddleNLP:/wangqinghui/PaddleMIX
+export PYTHONPATH= yourpath/PaddleNLP:yourpath/PaddleMIX
 ```
 
 ### 1.3 特别说明
@@ -51,16 +51,16 @@ export PYTHONPATH=/wangqinghui/PaddleNLP:/wangqinghui/PaddleMIX
 2. 修改Paddle代码
 进入到Paddle安装目录，打开文件：paddle/static/io.py, 注释第284-287行代码：
 ```python
-    if not skip_prune_program:
-        copy_program = copy_program._prune_with_input(
-            feeded_var_names=feed_var_names, targets=fetch_vars
-        )
+if not skip_prune_program:
+    copy_program = copy_program._prune_with_input(
+        feeded_var_names=feed_var_names, targets=fetch_vars
+    )
 ```
 
 ## 2. MiniGPT4 分阶段导出
 
 ### 2.1 导出前一部分子图：
-请确保在该目录下：PaddleMIX/paddlemix/examples/minigpt4/inference，按照以下命令进行导出：
+请确保在该目录下：PaddleMIX/paddlemix/examples/minigpt4/deploy，按照以下命令进行导出：
 ```
 python export_image_encoder.py \
     --minigpt4_13b_path "you minigpt4 dir path" \
@@ -83,7 +83,7 @@ python export_model.py \
 **备注**： 当前导出Llama部分需要转移到PaddleNLP下进行手动导出，后续将支持在PaddleMIX下一键转出。
 
 ## 3. MiniGPT4 静态图推理
-请进入到目录PaddleMIX/paddlemix/examples/minigpt4/inference，执行以下命令：
+请进入到目录PaddleMIX/paddlemix/examples/minigpt4/deploy，执行以下命令：
 ```python
 python run_static_predict.py \
     --first_model_path "The dir name of image encoder model" \
diff --git a/paddlemix/examples/minigpt4/inference/utils.py b/paddlemix/examples/minigpt4/inference/utils.py