|
| 1 | +# Model Inference Benchmark |
| 2 | + |
| 3 | +PaddleX support to benchmark model inference. Just set the related flags: |
| 4 | + |
| 5 | +* `PADDLE_PDX_INFER_BENCHMARK`: `True` means enable benchmark. `False` by default; |
| 6 | +* `PADDLE_PDX_INFER_BENCHMARK_WARMUP`: Number of warmup. Using random data to infer before testing benchmark if `input` is set to `None`. `0` by default; |
| 7 | +* `PADDLE_PDX_INFER_BENCHMARK_DATA_SIZE`: The size of randomly generated data. Valid only when `input` is set to `None`. `224` by default; |
| 8 | +* `PADDLE_PDX_INFER_BENCHMARK_ITER`: Number of benchmark testing using random data. Valid only when `input` is set to `None`. `10` by default; |
| 9 | +* `PADDLE_PDX_INFER_BENCHMARK_OUTPUT`: The directory to save benchmark result. `None` by default, that means not save. |
| 10 | + |
| 11 | +The example is as follows: |
| 12 | + |
| 13 | +```bash |
| 14 | +PADDLE_PDX_INFER_BENCHMARK=True \ |
| 15 | +PADDLE_PDX_INFER_BENCHMARK_WARMUP=5 \ |
| 16 | +PADDLE_PDX_INFER_BENCHMARK_DATA_SIZE=320 \ |
| 17 | +PADDLE_PDX_INFER_BENCHMARK_ITER=10 \ |
| 18 | +PADDLE_PDX_INFER_BENCHMARK_OUTPUT=./benchmark \ |
| 19 | +python main.py \ |
| 20 | + -c ./paddlex/configs/object_detection/PicoDet-XS.yaml \ |
| 21 | + -o Global.mode=predict \ |
| 22 | + -o Predict.model_dir=None \ |
| 23 | + -o Predict.batch_size=2 \ |
| 24 | + -o Predict.input=None |
| 25 | +``` |
| 26 | + |
| 27 | +The benchmark infomation would be print: |
| 28 | + |
| 29 | +``` |
| 30 | ++----------------+-----------------+-----------------+------------------------+ |
| 31 | +| Component | Total Time (ms) | Number of Calls | Avg Time Per Call (ms) | |
| 32 | ++----------------+-----------------+-----------------+------------------------+ |
| 33 | +| ReadCmp | 99.60412979 | 10 | 9.96041298 | |
| 34 | +| Resize | 17.01641083 | 20 | 0.85082054 | |
| 35 | +| Normalize | 44.61312294 | 20 | 2.23065615 | |
| 36 | +| ToCHWImage | 0.03385544 | 20 | 0.00169277 | |
| 37 | +| Copy2GPU | 13.46874237 | 10 | 1.34687424 | |
| 38 | +| Infer | 71.31743431 | 10 | 7.13174343 | |
| 39 | +| Copy2CPU | 0.39076805 | 10 | 0.03907681 | |
| 40 | +| DetPostProcess | 0.36168098 | 20 | 0.01808405 | |
| 41 | ++----------------+-----------------+-----------------+------------------------+ |
| 42 | ++-------------+-----------------+---------------------+----------------------------+ |
| 43 | +| Stage | Total Time (ms) | Number of Instances | Avg Time Per Instance (ms) | |
| 44 | ++-------------+-----------------+---------------------+----------------------------+ |
| 45 | +| PreProcess | 161.26751900 | 20 | 8.06337595 | |
| 46 | +| Inference | 85.17694473 | 20 | 4.25884724 | |
| 47 | +| PostProcess | 0.36168098 | 20 | 0.01808405 | |
| 48 | +| End2End | 256.90770149 | 20 | 12.84538507 | |
| 49 | +| WarmUp | 5412.37807274 | 10 | 541.23780727 | |
| 50 | ++-------------+-----------------+---------------------+----------------------------+ |
| 51 | +``` |
| 52 | + |
| 53 | +The first table show the benchmark infomation by each component(`Component`), include `Total Time` (unit is "ms"), `Number of Calls` and `Avg Time Per Call` (unit is "ms"). `Avg Time Per Call` is `Total Time` devide by `Number of Calls`. It should be noted that the `Number of Calls` is the number of times the component has been called. |
| 54 | + |
| 55 | +And the second table show the benchmark infomation by different stages: `WarmUp`, `PreProcess`, `Inference`, `PostProcess` and `End2End`. Different from the first table, `Number of Instances` is the number of instances (samples), not the number of calls. |
| 56 | + |
| 57 | +Meanwhile, the benchmark infomation would be saved to local files (`detail.csv` and `summary.csv`) if you set `PADDLE_PDX_INFER_BENCHMARK_OUTPUT`: |
| 58 | + |
| 59 | +```csv |
| 60 | +Component,Total Time (ms),Number of Calls,Avg Time Per Call (ms) |
| 61 | +ReadCmp,99.60412979125977,10,9.960412979125977 |
| 62 | +Resize,17.01641082763672,20,0.8508205413818359 |
| 63 | +Normalize,44.61312294006348,20,2.230656147003174 |
| 64 | +ToCHWImage,0.033855438232421875,20,0.0016927719116210938 |
| 65 | +Copy2GPU,13.468742370605469,10,1.3468742370605469 |
| 66 | +Infer,71.31743431091309,10,7.131743431091309 |
| 67 | +Copy2CPU,0.39076805114746094,10,0.039076805114746094 |
| 68 | +DetPostProcess,0.3616809844970703,20,0.018084049224853516 |
| 69 | +``` |
| 70 | + |
| 71 | +```csv |
| 72 | +Stage,Total Time (ms),Number of Instances,Avg Time Per Instance (ms) |
| 73 | +PreProcess,161.26751899719238,20,8.06337594985962 |
| 74 | +Inference,85.17694473266602,20,4.258847236633301 |
| 75 | +PostProcess,0.3616809844970703,20,0.018084049224853516 |
| 76 | +End2End,256.90770149230957,20,12.845385074615479 |
| 77 | +WarmUp,5412.3780727386475,10,541.2378072738647 |
| 78 | +``` |
0 commit comments