GPU多线程使用时，程序卡住随后崩溃。

*********************************************
温馨提示：根据社区不完全统计，按照模板提问，可以加快回复和解决问题的速度
*********************************************

## 环境

- 【FastDeploy版本】： fastdeploy-win-x64-gpu-1.0.7
- 【编译命令】使用下载的预编译库 fastdeploy-win-x64-gpu-1.0.7
- 【系统平台】: Windows x64(Windows11)
- 【硬件】： 说明具体硬件型号，如 Nvidia GPU 4070 Laptop， CUDA 11.6 CUDNN 8.2
- 【编译语言】： C++

## 问题日志及出现问题的操作流程
参考官方示例`multi_thread.cc`，将代码修改至如下：
```
#include <thread>
#include <future>
#include "fastdeploy/vision.h"
#ifdef WIN32
const char sep = '\\';
#else
const char sep = '/';
#endif

// void Predict(fastdeploy::vision::detection::PPDetBase *model, int thread_id, const std::vector<std::string>& images) {
void Predict(fastdeploy::vision::detection::PPDetBase *model, int thread_id, const std::string& image_file) {
    // for (auto const &image_file : images) {
        auto im = cv::imread(image_file);

        fastdeploy::vision::DetectionResult res;
        if (!model->Predict(im, &res)) {
            std::cerr << "Failed to predict." << std::endl;
            return;
        }

        // print res
        std::cout << "Thread Id: " << thread_id << std::endl;
        // std::cout << res.Str() << std::endl;
    // }
}

void GetImageList(std::vector<std::vector<std::string>>* image_list, const std::string& image_file_path, int thread_num){
    std::vector<cv::String> images;
    cv::glob(image_file_path, images, false);
    // number of image files in images folder
    size_t count = images.size();
    size_t num = count / thread_num;
    for (int i = 0; i < thread_num; i++) {
        std::vector<std::string> temp_list;
        if (i == thread_num - 1) {
            for (size_t j = i*num; j < count; j++){
                temp_list.push_back(images[j]);
            }
        } else {
            for (size_t j = 0; j < num; j++){
                temp_list.push_back(images[i * num + j]);
            }
        }
        (*image_list)[i] = temp_list;
    }
}

void GpuInfer(const std::string& model_dir, const std::string& image_file_path, int thread_num) {
    auto model_file = model_dir + sep + "model.pdmodel";
    auto params_file = model_dir + sep + "model.pdiparams";
    auto config_file = model_dir + sep + "infer_cfg.yml";
    auto option = fastdeploy::RuntimeOption();
    option.UseGpu();
    option.UsePaddleBackend();
    auto model = fastdeploy::vision::detection::PPYOLOE(
        model_file, params_file, config_file, option);
    if (!model.Initialized()) {
        std::cerr << "Failed to initialize." << std::endl;
        return;
    }

    std::vector<decltype(model.Clone())> models;
    for (int i = 0; i < thread_num; ++i) {
        models.emplace_back(model.Clone());
    }

    std::vector<std::vector<std::string>> image_list(thread_num);
    GetImageList(&image_list, image_file_path, thread_num);

    std::thread t1(Predict, models[0].get(), 0, R"(E:\project_codes\paddle_test\release\test1.jpg)");
    std::thread t2(Predict, models[1].get(), 1, R"(E:\project_codes\paddle_test\release\test2.jpg)");
    // std::thread t3(Predict, models[2].get(), 2, R"(E:\project_codes\paddle_test\release\test3.jpg)");
    // std::thread t4(Predict, models[3].get(), 3, R"(E:\project_codes\paddle_test\release\test4.jpg)");

    t1.join();
    t2.join();
    // t3.join();
    // t4.join();

    // auto ret1 = std::async(std::launch::async, Predict, models[0].get(), 0, R"(E:\project_codes\paddle_test\release\test.jpg)");
    // auto ret2 = std::async(std::launch::async, Predict, models[1].get(), 1, R"(E:\project_codes\paddle_test\release\test.jpg)");

    // ret1.get();
    // ret2.get();

    // std::vector<std::thread> threads;
    // for (int i = 0; i < thread_num; ++i) {
    //     threads.emplace_back(Predict, models[i].get(), i, image_list[i]);
    // }

    // for (int i = 0; i < thread_num; ++i) {
    //     threads[i].join();
    // }
}

int main(int argc, char **argv) {
    GpuInfer(R"(E:\project_codes\paddle_test\release\ppyoloe_plus_crn_m_80e_coco)", R"(E:\project_codes\paddle_test\release\test1.jpg)" , 2);

    return 0;
}
```
控制台信息如下：

> 17:11:56: Starting E:\project_codes\paddle_test\release\paddle_test.exe...
[INFO] fastdeploy/vision/common/processors/transform.cc(45)::fastdeploy::vision::FuseNormalizeCast	Normalize and Cast are fused to Normalize in preprocessing pipeline.
[INFO] fastdeploy/vision/common/processors/transform.cc(93)::fastdeploy::vision::FuseNormalizeHWC2CHW	Normalize and HWC2CHW are fused to NormalizeAndPermute  in preprocessing pipeline.
[INFO] fastdeploy/vision/common/processors/transform.cc(159)::fastdeploy::vision::FuseNormalizeColorConvert	BGR2RGB and NormalizeAndPermute are fused to NormalizeAndPermute with swap_rb=1
[INFO] fastdeploy/runtime/runtime.cc(273)::fastdeploy::Runtime::CreatePaddleBackend	Runtime initialized with Backend::PDINFER in Device::GPU.
[INFO] fastdeploy/runtime/runtime.cc(384)::fastdeploy::Runtime::Clone	Runtime Clone with Backend:: Backend::PDINFER in Device::GPU.
[INFO] fastdeploy/runtime/runtime.cc(384)::fastdeploy::Runtime::Clone	Runtime Clone with Backend:: Backend::PDINFER in Device::GPU.
17:13:23: E:\project_codes\paddle_test\release\paddle_test.exe 崩溃。

我是将官方示例的模型改为了PPYOLOE，然后作了一些相应的修改，当代码不是：
```
 std::thread t1(Predict, models[0].get(), 0, R"(E:\project_codes\paddle_test\release\test1.jpg)");
 std::thread t2(Predict, models[1].get(), 1, R"(E:\project_codes\paddle_test\release\test2.jpg)");
 // std::thread t3(Predict, models[2].get(), 2, R"(E:\project_codes\paddle_test\release\test3.jpg)");
 // std::thread t4(Predict, models[3].get(), 3, R"(E:\project_codes\paddle_test\release\test4.jpg)");

 t1.join();
 t2.join();
```
而是仅开启一个线程时是可以正常运行的：
```
 std::thread t1(Predict, models[0].get(), 0, R"(E:\project_codes\paddle_test\release\test1.jpg)");
 // std::thread t2(Predict, models[1].get(), 1, R"(E:\project_codes\paddle_test\release\test2.jpg)");
 // std::thread t3(Predict, models[2].get(), 2, R"(E:\project_codes\paddle_test\release\test3.jpg)");
 // std::thread t4(Predict, models[3].get(), 3, R"(E:\project_codes\paddle_test\release\test4.jpg)");

 t1.join();
 // t2.join();
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPU多线程使用时，程序卡住随后崩溃。 #2620

环境

问题日志及出现问题的操作流程

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GPU多线程使用时，程序卡住随后崩溃。 #2620

Description

环境

问题日志及出现问题的操作流程

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions