Skip to content

paddle c++ inference doer not work with quantization fp 16 on Jetson #74262

@quangsonle

Description

@quangsonle

bug描述 Describe the Bug

when i run with fp16 run_mode == "trt_fp16" on jetson environment for the segmentation, the model detects nothing. while such the model detects segments well with run_mode== "paddle" or run_mode != "trt_fp32", with the same input
paddle is built from source on the jetson itself with CUDA version: 12.2
CUDNN version: v8.9
CXX compiler version: 11.4.0
WITH_TENSORRT: ON
TensorRT version: v8.6.2.3

by these commands:
git clone https://github.com/PaddlePaddle/Paddle.git
cd Paddle

mkdir -p build && cd ./build

cmake .. -DPY_VERSION=3.10
-DWITH_MKL=OFF
-DWITH_TESTING=OFF
-DCMAKE_BUILD_TYPE=Release
-DON_INFER=ON
-DWITH_PYTHON=ON
-DWITH_XBYAK=OFF
-DWITH_NV_JETSON=ON
-DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda
-DWITH_NCCL=OFF
-DWITH_RCCL=OFF
-DWITH_DISTRIBUTE=OFF
-DWITH_GPU=ON
-DWITH_TENSORRT=ON
-DWITH_ARM=ON

ulimit -n 65535 && make TARGET=ARMV8 -j3

with the same model, the same input, quantization fp16 works on AMD machine

其他补充信息 Additional Supplementary Information

P/s: i added debug logs in the object_detector.cc: https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.8.1/deploy/cpp/src/object_detector.cc
// ... inside the Predict() function ...

auto inference_end = std::chrono::steady_clock::now();

// ====================== DEBUG BLOCK WAS ADDED HERE ======================
// We check the contents of the main output tensor (out_tensor_list[0])
// immediately after it comes from the model.
if (!out_tensor_list.empty() && !out_tensor_list[0].empty()) {
std::cout << "----------------------------------------------------" << std::endl;
std::cout << "[DEBUG] First 12 values of the BBox/Score Output Tensor:" << std::endl;
for (int i = 0; i < 12 && i < out_tensor_list[0].size(); ++i) {
std::cout << out_tensor_list[0][i] << " ";
if ((i + 1) % 6 == 0) {
std::cout << std::endl;
}
}
std::cout << "----------------------------------------------------" << std::endl;
}

and got [DEBUG] First 12 values of the BBox/Score Output Tensor:
0.0000 nan nan nan nan nan
0.0000 nan nan nan nan nan

when it ran with fp16, apparently the fp16 mode of Paddle model does not match Jetson tensorRT architecture

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions