paddle c++ inference doer not work with quantization fp 16 on Jetson

### bug描述 Describe the Bug

when i run with fp16 run_mode == "trt_fp16" on jetson environment for the segmentation, the model detects  nothing. while such the model detects segments well with run_mode== "paddle" or run_mode != "trt_fp32", with the same input
paddle is built from source on the jetson itself with  CUDA version: 12.2
CUDNN version: v8.9
CXX compiler version: 11.4.0
WITH_TENSORRT: ON
TensorRT version: v8.6.2.3


by these commands:
git clone https://github.com/PaddlePaddle/Paddle.git
cd Paddle

mkdir -p build && cd ./build

cmake .. -DPY_VERSION=3.10 \
         -DWITH_MKL=OFF \
         -DWITH_TESTING=OFF \
         -DCMAKE_BUILD_TYPE=Release \
         -DON_INFER=ON \
         -DWITH_PYTHON=ON \
         -DWITH_XBYAK=OFF \
         -DWITH_NV_JETSON=ON \
         -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda \
         -DWITH_NCCL=OFF \
         -DWITH_RCCL=OFF \
         -DWITH_DISTRIBUTE=OFF \
         -DWITH_GPU=ON \
         -DWITH_TENSORRT=ON \
         -DWITH_ARM=ON

ulimit -n 65535 && make TARGET=ARMV8 -j3


with the same model, the same input, quantization fp16 works on AMD machine

### 其他补充信息 Additional Supplementary Information

P/s: i added debug logs in the object_detector.cc: https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.8.1/deploy/cpp/src/object_detector.cc
// ... inside the Predict() function ...

  auto inference_end = std::chrono::steady_clock::now();

  // ====================== DEBUG BLOCK WAS ADDED HERE ======================
  // We check the contents of the main output tensor (out_tensor_list[0])
  // immediately after it comes from the model.
  if (!out_tensor_list.empty() && !out_tensor_list[0].empty()) {
      std::cout << "----------------------------------------------------" << std::endl;
      std::cout << "[DEBUG] First 12 values of the BBox/Score Output Tensor:" << std::endl;
      for (int i = 0; i < 12 && i < out_tensor_list[0].size(); ++i) {
          std::cout << out_tensor_list[0][i] << " ";
          if ((i + 1) % 6 == 0) {
              std::cout << std::endl;
          }
      }
      std::cout << "----------------------------------------------------" << std::endl;
  }


and got [DEBUG] First 12 values of the BBox/Score Output Tensor:
0.0000 nan nan nan nan nan 
0.0000 nan nan nan nan nan

when it ran with fp16, apparently the fp16 mode of Paddle model does not match Jetson tensorRT architecture 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

paddle c++ inference doer not work with quantization fp 16 on Jetson #74262

bug描述 Describe the Bug

其他补充信息 Additional Supplementary Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

paddle c++ inference doer not work with quantization fp 16 on Jetson #74262

Description

bug描述 Describe the Bug

其他补充信息 Additional Supplementary Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions