Skip to content

Commit 17cc27a

Browse files
committed
add the en benchmark doc
1 parent a0adda4 commit 17cc27a

File tree

2 files changed

+79
-1
lines changed

2 files changed

+79
-1
lines changed
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
# Model Inference Benchmark
2+
3+
PaddleX support to benchmark model inference. Just set the related flags:
4+
5+
* `PADDLE_PDX_INFER_BENCHMARK`: `True` means enable benchmark. `False` by default;
6+
* `PADDLE_PDX_INFER_BENCHMARK_WARMUP`: Number of warmup. Using random data to infer before testing benchmark if `input` is set to `None`. `0` by default;
7+
* `PADDLE_PDX_INFER_BENCHMARK_DATA_SIZE`: The size of randomly generated data. Valid only when `input` is set to `None`. `224` by default;
8+
* `PADDLE_PDX_INFER_BENCHMARK_ITER`: Number of benchmark testing using random data. Valid only when `input` is set to `None`. `10` by default;
9+
* `PADDLE_PDX_INFER_BENCHMARK_OUTPUT`: The directory to save benchmark result. `None` by default, that means not save.
10+
11+
The example is as follows:
12+
13+
```bash
14+
PADDLE_PDX_INFER_BENCHMARK=True \
15+
PADDLE_PDX_INFER_BENCHMARK_WARMUP=5 \
16+
PADDLE_PDX_INFER_BENCHMARK_DATA_SIZE=320 \
17+
PADDLE_PDX_INFER_BENCHMARK_ITER=10 \
18+
PADDLE_PDX_INFER_BENCHMARK_OUTPUT=./benchmark \
19+
python main.py \
20+
-c ./paddlex/configs/object_detection/PicoDet-XS.yaml \
21+
-o Global.mode=predict \
22+
-o Predict.model_dir=None \
23+
-o Predict.batch_size=2 \
24+
-o Predict.input=None
25+
```
26+
27+
The benchmark infomation would be print:
28+
29+
```
30+
+----------------+-----------------+-----------------+------------------------+
31+
| Component | Total Time (ms) | Number of Calls | Avg Time Per Call (ms) |
32+
+----------------+-----------------+-----------------+------------------------+
33+
| ReadCmp | 99.60412979 | 10 | 9.96041298 |
34+
| Resize | 17.01641083 | 20 | 0.85082054 |
35+
| Normalize | 44.61312294 | 20 | 2.23065615 |
36+
| ToCHWImage | 0.03385544 | 20 | 0.00169277 |
37+
| Copy2GPU | 13.46874237 | 10 | 1.34687424 |
38+
| Infer | 71.31743431 | 10 | 7.13174343 |
39+
| Copy2CPU | 0.39076805 | 10 | 0.03907681 |
40+
| DetPostProcess | 0.36168098 | 20 | 0.01808405 |
41+
+----------------+-----------------+-----------------+------------------------+
42+
+-------------+-----------------+---------------------+----------------------------+
43+
| Stage | Total Time (ms) | Number of Instances | Avg Time Per Instance (ms) |
44+
+-------------+-----------------+---------------------+----------------------------+
45+
| PreProcess | 161.26751900 | 20 | 8.06337595 |
46+
| Inference | 85.17694473 | 20 | 4.25884724 |
47+
| PostProcess | 0.36168098 | 20 | 0.01808405 |
48+
| End2End | 256.90770149 | 20 | 12.84538507 |
49+
| WarmUp | 5412.37807274 | 10 | 541.23780727 |
50+
+-------------+-----------------+---------------------+----------------------------+
51+
```
52+
53+
The first table show the benchmark infomation by each component(`Component`), include `Total Time` (unit is "ms"), `Number of Calls` and `Avg Time Per Call` (unit is "ms"). `Avg Time Per Call` is `Total Time` devide by `Number of Calls`. It should be noted that the `Number of Calls` is the number of times the component has been called.
54+
55+
And the second table show the benchmark infomation by different stages: `WarmUp`, `PreProcess`, `Inference`, `PostProcess` and `End2End`. Different from the first table, `Number of Instances` is the number of instances (samples), not the number of calls.
56+
57+
Meanwhile, the benchmark infomation would be saved to local files (`detail.csv` and `summary.csv`) if you set `PADDLE_PDX_INFER_BENCHMARK_OUTPUT`:
58+
59+
```csv
60+
Component,Total Time (ms),Number of Calls,Avg Time Per Call (ms)
61+
ReadCmp,99.60412979125977,10,9.960412979125977
62+
Resize,17.01641082763672,20,0.8508205413818359
63+
Normalize,44.61312294006348,20,2.230656147003174
64+
ToCHWImage,0.033855438232421875,20,0.0016927719116210938
65+
Copy2GPU,13.468742370605469,10,1.3468742370605469
66+
Infer,71.31743431091309,10,7.131743431091309
67+
Copy2CPU,0.39076805114746094,10,0.039076805114746094
68+
DetPostProcess,0.3616809844970703,20,0.018084049224853516
69+
```
70+
71+
```csv
72+
Stage,Total Time (ms),Number of Instances,Avg Time Per Instance (ms)
73+
PreProcess,161.26751899719238,20,8.06337594985962
74+
Inference,85.17694473266602,20,4.258847236633301
75+
PostProcess,0.3616809844970703,20,0.018084049224853516
76+
End2End,256.90770149230957,20,12.845385074615479
77+
WarmUp,5412.3780727386475,10,541.2378072738647
78+
```

docs/module_usage/instructions/benchmark.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ PaddleX 支持统计模型推理耗时,需通过环境变量进行设置,具
55
* `PADDLE_PDX_INFER_BENCHMARK`:设置为 `True` 时则开启 Benchmark,默认为 `False`
66
* `PADDLE_PDX_INFER_BENCHMARK_WARMUP`:设置 warm up,在开始测试前,使用随机数据循环迭代 n 次,默认为 `0`
77
* `PADDLE_PDX_INFER_BENCHMARK_DATA_SIZE`: 设置随机数据的尺寸,默认为 `224`
8-
* `PADDLE_PDX_INFER_BENCHMARK_ITER`:使用随机数据进行 Benchmark 测试的循环次数,仅当输入数据为 `None` 时,将使用随机数据进行测试;
8+
* `PADDLE_PDX_INFER_BENCHMARK_ITER`:使用随机数据进行 Benchmark 测试的循环次数,仅当输入数据为 `None` 时,将使用随机数据进行测试,默认为 `10`
99
* `PADDLE_PDX_INFER_BENCHMARK_OUTPUT`:用于设置保存的目录,如 `./benchmark`,默认为 `None`,表示不保存 Benchmark 指标;
1010

1111
使用示例如下:

0 commit comments

Comments
 (0)