Skip to content

Commit 96d309a

Browse files
authored
Cherry pick some PR (#1354)
1 parent 0c34d23 commit 96d309a

File tree

112 files changed

+3674
-3666
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

112 files changed

+3674
-3666
lines changed

README.md

Lines changed: 32 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -4,40 +4,51 @@
44
</p>
55

66
<p align="center">
7-
<a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-blue.svg"></a>
8-
<a href="https://paddleslim.readthedocs.io/en/latest/"><img src="https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat"></a>
9-
<a href="https://paddleslim.readthedocs.io/zh_CN/latest/"><img src="https://img.shields.io/badge/中文文档-最新-brightgreen.svg"></a>
7+
<a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-dfd.svg"></a>
8+
<a href="https://github.com/PaddlePaddle/PaddleSlim/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/Paddle?color=ffa"></a>
9+
<a href=""><img src="https://img.shields.io/badge/python-3.6.2+-aff.svg"></a>
10+
<a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
11+
<a href="https://github.com/PaddlePaddle/PaddleSlim/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/PaddleSlim?color=9ea"></a>
12+
<a href="https://pypi.org/project/PaddleSlim/"><img src="https://img.shields.io/pypi/dm/PaddleSlim?color=9cf"></a>
13+
<a href="https://github.com/PaddlePaddle/PaddleSlim/issues"><img src="https://img.shields.io/github/issues/PaddlePaddle/PaddleSlim?color=9cc"></a>
14+
<a href="https://github.com/PaddlePaddle/PaddleSlim/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleSlim?color=ccf"></a>
1015
</p>
1116

12-
PaddleSlim是一个专注于深度学习模型压缩的工具库,提供**低比特量化、知识蒸馏、稀疏化和模型结构搜索**等模型压缩策略,帮助用户快速实现模型的小型化
17+
PaddleSlim是一个专注于深度学习模型压缩的工具库,提供**低比特量化、知识蒸馏、稀疏化和模型结构搜索**等模型压缩策略,帮助开发者快速实现模型的小型化
1318

1419
## 产品动态
1520

21+
- 🔥 **2022.08.16:自动化压缩功能升级**
22+
- 支持直接加载ONNX模型和Paddle模型导出至ONNX
23+
- 发布量化分析工具试用版,发布[YOLO系列离线量化工具](example/post_training_quantization/pytorch_yolo_series/)
24+
- 更新[YOLO-Series自动化压缩模型库](example/auto_compression/pytorch_yolo_series)
25+
26+
| 模型 | Base mAP<sup>val<br>0.5:0.95 | ACT量化mAP<sup>val<br>0.5:0.95 | 模型体积压缩比 | 预测时延<sup><small>FP32</small><sup><br><sup> | 预测时延<sup><small>INT8</small><sup><br><sup> | 预测加速比 |
27+
| :-------- |:-------- |:--------: | :--------: | :---------------------: | :----------------: | :----------------: |
28+
| PPYOLOE-s | 43.1 | 42.6 | 3.9倍 | 6.51ms | 2.12ms | 3.1倍 |
29+
| YOLOv5s | 37.4 | 36.9 | 3.8倍 | 5.95ms | 1.87ms | 3.2倍 |
30+
| YOLOv6s | 42.4 | 41.3 | 3.9倍 | 9.06ms | 1.83ms | 5.0倍 |
31+
| YOLOv7 | 51.1 | 50.9 | 3.9倍 | 26.84ms | 4.55ms | 5.9倍 |
32+
| YOLOv7-Tiny | 37.3 | 37.0 | 3.9倍 | 5.06ms | 1.68ms | 3.0倍 |
33+
34+
1635
- 🔥 **2022.07.01: 发布[v2.3.0版本](https://github.com/PaddlePaddle/PaddleSlim/releases/tag/v2.3.0)**
1736

1837
- 发布[自动化压缩功能](example/auto_compression)
19-
20-
- 支持代码无感知压缩:用户只需提供推理模型文件和数据,既可进行离线量化(PTQ)、量化训练(QAT)、稀疏训练等压缩任务。
38+
- 支持代码无感知压缩:开发者只需提供推理模型文件和数据,既可进行离线量化(PTQ)、量化训练(QAT)、稀疏训练等压缩任务。
2139
- 支持自动策略选择,根据任务特点和部署环境特性:自动搜索合适的离线量化方法,自动搜索最佳的压缩策略组合方式。
2240
- 发布[自然语言处理](example/auto_compression/nlp)[图像语义分割](example/auto_compression/semantic_segmentation)[图像目标检测](example/auto_compression/detection)三个方向的自动化压缩示例。
23-
- 发布`X2Paddle`模型自动化压缩方案:[YOLOv5](example/auto_compression/pytorch_yolov5)[YOLOv6](example/auto_compression/pytorch_yolov6)[YOLOv7](example/auto_compression/pytorch_yolov7)[HuggingFace](example/auto_compression/pytorch_huggingface)[MobileNet](example/auto_compression/tensorflow_mobilenet)
24-
41+
- 发布`X2Paddle`模型自动化压缩方案:[YOLOv5](example/auto_compression/pytorch_yolo_series)[YOLOv6](example/auto_compression/pytorch_yolo_series)[YOLOv7](example/auto_compression/pytorch_yolo_series)[HuggingFace](example/auto_compression/pytorch_huggingface)[MobileNet](example/auto_compression/tensorflow_mobilenet)
2542
- 升级量化功能
26-
27-
- 统一量化模型格式
28-
- 离线量化支持while op
29-
- 新增7种[离线量化方法](docs/zh_cn/tutorials/quant/post_training_quantization.md), 包括HIST, AVG, EMD, Bias Correction, AdaRound等
30-
- 修复BERT大模型量化训练过慢的问题
31-
43+
- 统一量化模型格式;离线量化支持while op;修复BERT大模型量化训练过慢的问题。
44+
- 新增7种[离线量化方法](docs/zh_cn/tutorials/quant/post_training_quantization.md), 包括HIST, AVG, EMD, Bias Correction, AdaRound等。
3245
- 支持半结构化稀疏训练
33-
3446
- 新增延时预估工具
47+
- 支持对稀疏化模型、低比特量化模型的性能预估;支持预估指定模型在特定部署环境下 (ARM CPU + Paddle Lite) 的推理性能;提供 SD625、SD710、RK3288 芯片 + Paddle Lite 的预估接口。
48+
- 提供部署环境自动扩展工具,可以自动增加在更多 ARM CPU 设备上的预估工具。
3549

36-
- 支持预估指定模型在特定部署环境下 (ARM CPU + Paddle Lite) 的推理性能
37-
- 提供部署环境自动扩展工具,可以自动增加在更多 ARM CPU 设备上的预估工具
38-
- 支持对稀疏化模型、低比特量化模型的性能预估
39-
- 提供 SD625、SD710、RK3288 芯片 + Paddle Lite 的预估接口
40-
50+
<details>
51+
<summary>历史更新</summary>
4152

4253
- **2021.11.15: 发布v2.2.0版本**
4354

@@ -52,6 +63,7 @@ PaddleSlim是一个专注于深度学习模型压缩的工具库,提供**低
5263

5364
更多信息请参考:[release note](https://github.com/PaddlePaddle/PaddleSlim/releases)
5465

66+
</details>
5567

5668
## 基础压缩功能概览
5769

demo/quant/pact_quant_aware/train.py

Lines changed: 14 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,8 @@
6565
"Whether to use PACT or not.")
6666
add_arg('analysis', bool, False,
6767
"Whether analysis variables distribution.")
68+
add_arg('onnx_format', bool, False,
69+
"Whether use onnx format or not.")
6870
add_arg('ce_test', bool, False, "Whether to CE test.")
6971

7072
# yapf: enable
@@ -257,6 +259,8 @@ def compress(args):
257259
'window_size': 10000,
258260
# The decay coefficient of moving average, default is 0.9
259261
'moving_rate': 0.9,
262+
# Whether use onnx format or not
263+
'onnx_format': args.onnx_format,
260264
}
261265

262266
# 2. quantization transform programs (training aware)
@@ -298,9 +302,9 @@ def get_optimizer():
298302
places,
299303
quant_config,
300304
scope=None,
301-
act_preprocess_func=act_preprocess_func,
302-
optimizer_func=optimizer_func,
303-
executor=executor,
305+
act_preprocess_func=None,
306+
optimizer_func=None,
307+
executor=None,
304308
for_test=True)
305309
compiled_train_prog = quant_aware(
306310
train_prog,
@@ -425,29 +429,23 @@ def train(epoch, compiled_train_prog, lr):
425429
# 3. Freeze the graph after training by adjusting the quantize
426430
# operators' order for the inference.
427431
# The dtype of float_program's weights is float32, but in int8 range.
428-
float_program, int8_program = convert(val_program, places, quant_config, \
429-
scope=None, \
430-
save_int8=True)
432+
model_path = os.path.join(quantization_model_save_dir, args.model)
433+
if not os.path.isdir(model_path):
434+
os.makedirs(model_path)
435+
float_program = convert(val_program, places, quant_config)
431436
_logger.info("eval best_model after convert")
432437
final_acc1 = test(best_epoch, float_program)
433438
_logger.info("final acc:{}".format(final_acc1))
434439

435440
# 4. Save inference model
436-
model_path = os.path.join(quantization_model_save_dir, args.model,
437-
'act_' + quant_config['activation_quantize_type']
438-
+ '_w_' + quant_config['weight_quantize_type'])
439-
float_path = os.path.join(model_path, 'float')
440-
if not os.path.isdir(model_path):
441-
os.makedirs(model_path)
442-
443441
paddle.fluid.io.save_inference_model(
444-
dirname=float_path,
442+
dirname=model_path,
445443
feeded_var_names=[image.name],
446444
target_vars=[out],
447445
executor=exe,
448446
main_program=float_program,
449-
model_filename=float_path + '/model',
450-
params_filename=float_path + '/params')
447+
model_filename=model_path + '/model.pdmodel',
448+
params_filename=model_path + '/model.pdiparams')
451449

452450

453451
def main():

demo/quant/quant_aware/train.py

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,8 @@ def compress(args):
126126
'window_size': 10000,
127127
# The decay coefficient of moving average, default is 0.9
128128
'moving_rate': 0.9,
129+
# Whether use onnx format or not
130+
'onnx_format': args.onnx_format,
129131
}
130132

131133
pretrain = True
@@ -294,10 +296,7 @@ def train(epoch, compiled_train_prog):
294296
# operators' order for the inference.
295297
# The dtype of float_program's weights is float32, but in int8 range.
296298
############################################################################################################
297-
float_program, int8_program = convert(val_program, places, quant_config, \
298-
scope=None, \
299-
save_int8=True,
300-
onnx_format=args.onnx_format)
299+
float_program = convert(val_program, places, quant_config)
301300
print("eval best_model after convert")
302301
final_acc1 = test(best_epoch, float_program)
303302
############################################################################################################

demo/quant/quant_post/eval.py

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,7 @@
2121
import paddle
2222
sys.path[0] = os.path.join(
2323
os.path.dirname("__file__"), os.path.pardir, os.path.pardir)
24-
sys.path[1] = os.path.join(
25-
os.path.dirname("__file__"), os.path.pardir)
24+
sys.path[1] = os.path.join(os.path.dirname("__file__"), os.path.pardir)
2625
import imagenet_reader as reader
2726
from utility import add_arguments, print_arguments
2827

@@ -31,8 +30,8 @@
3130
add_arg = functools.partial(add_arguments, argparser=parser)
3231
add_arg('use_gpu', bool, True, "Whether to use GPU or not.")
3332
add_arg('model_path', str, "./pruning/checkpoints/resnet50/2/eval_model/", "Whether to use pretrained model.")
34-
add_arg('model_name', str, '__model__', "model filename for inference model")
35-
add_arg('params_name', str, '__params__', "params filename for inference model")
33+
add_arg('model_name', str, 'model.pdmodel', "model filename for inference model")
34+
add_arg('params_name', str, 'model.pdiparams', "params filename for inference model")
3635
add_arg('batch_size', int, 64, "Minibatch size.")
3736
# yapf: enable
3837

docs/zh_cn/api_cn/static/auto-compression/auto_compression_api.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,19 +3,19 @@ AutoCompression自动压缩功能
33

44
AutoCompression
55
---------------
6-
.. py:class:: paddleslim.auto_compression.AutoCompression(model_dir, model_filename, params_filename, save_dir, strategy_config, train_config, train_dataloader, eval_callback, devices='gpu')
6+
.. py:class:: paddleslim.auto_compression.AutoCompression(model_dir, train_dataloader, model_filename, params_filename, save_dir, strategy_config, train_config, eval_callback, devices='gpu')
77
8-
`源代码 <https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/auto_compression.py#L32>`_
8+
`源代码 <https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/auto_compression.py#L49>`_
99

1010
根据指定的配置对使用 ``paddle.jit.save`` 接口或者 ``paddle.static.save_inference_model`` 接口保存的推理模型进行压缩。
1111

1212
**参数: **
1313
1414
- **model_dir(str)** - 需要压缩的推理模型所在的目录。
15+
- **train_dataloader(paddle.io.DataLoader)** - 训练数据迭代器。注意:如果选择离线量化超参搜索策略的话, ``train_dataloader`` 和 ``eval_callback`` 设置相同的数据读取即可。
1516
- **model_filename(str)** - 需要压缩的推理模型文件名称。
1617
- **params_filename(str)** - 需要压缩的推理模型参数文件名称。
1718
- **save_dir(str)** - 压缩后模型的所保存的目录。
18-
- **train_dataloader(paddle.io.DataLoader)** - 训练数据迭代器。注意:如果选择离线量化超参搜索策略的话, ``train_dataloader`` 和 ``eval_callback`` 设置相同的数据读取即可。
1919
- **train_config(dict)** - 训练配置。可以配置的参数请参考: `<https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L103>`_ 。注意:如果选择离线量化超参搜索策略的话, ``train_config`` 直接设置为 ``None`` 即可。
2020
- **strategy_config(dict, list(dict), 可选)** - 使用的压缩策略,可以通过设置多个单种策略来并行使用这些压缩方式。字典的关键字必须在:
2121
``Quantization`` (量化配置, 可配置的参数参考 `<https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L24>`_ ),
@@ -82,13 +82,13 @@ AutoCompression
8282
8383
eval_dataloader = Cifar10(mode='eval')
8484
85-
ac = AutoCompression(model_path, model_filename, params_filename, save_dir, \
85+
ac = AutoCompression(model_path, train_dataloader, model_filename, params_filename, save_dir, \
8686
8787
strategy_config="Quantization": Quantization(**default_ptq_config),
8888
8989
"Distillation": HyperParameterOptimization(**default_distill_config)}, \
9090
91-
train_config=None, train_dataloader=train_dataloader, eval_callback=eval_dataloader,devices='gpu')
91+
train_config=None, eval_callback=eval_dataloader,devices='gpu')
9292
9393
```
9494

docs/zh_cn/api_cn/static/quant/quantization_api.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ quant_post_dynamic
118118
quant_post_static
119119
---------------
120120

121-
.. py:function:: paddleslim.quant.quant_post_static(executor,model_dir, quantize_model_path, batch_generator=None, sample_generator=None, model_filename=None, params_filename=None, save_model_filename='__model__', save_params_filename='__params__', batch_size=16, batch_nums=None, scope=None, algo='KL', round_type='round', quantizable_op_type=["conv2d","depthwise_conv2d","mul"], is_full_quantize=False, weight_bits=8, activation_bits=8, activation_quantize_type='range_abs_max', weight_quantize_type='channel_wise_abs_max', onnx_format=False, skip_tensor_list=None, optimize_model=False)
121+
.. py:function:: paddleslim.quant.quant_post_static(executor,model_dir, quantize_model_path, batch_generator=None, sample_generator=None, model_filename=None, params_filename=None, save_model_filename='model.pdmodel', save_params_filename='model.pdiparams', batch_size=16, batch_nums=None, scope=None, algo='KL', round_type='round', quantizable_op_type=["conv2d","depthwise_conv2d","mul"], is_full_quantize=False, weight_bits=8, activation_bits=8, activation_quantize_type='range_abs_max', weight_quantize_type='channel_wise_abs_max', onnx_format=False, skip_tensor_list=None, optimize_model=False)
122122
123123
`源代码 <https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/quant/quanter.py>`_
124124

@@ -217,15 +217,15 @@ quant_post_static
217217
target_vars=[out],
218218
main_program=val_prog,
219219
executor=exe,
220-
model_filename='__model__',
221-
params_filename='__params__')
220+
model_filename='model.pdmodel',
221+
params_filename='model.pdiparams')
222222
quant_post_static(
223223
executor=exe,
224224
model_dir='./model_path',
225225
quantize_model_path='./save_path',
226226
sample_generator=val_reader,
227-
model_filename='__model__',
228-
params_filename='__params__',
227+
model_filename='model.pdmodel',
228+
params_filename='model.pdiparams',
229229
batch_size=16,
230230
batch_nums=10)
231231

example/auto_compression/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -82,15 +82,15 @@ ACT相比传统的模型压缩方法,
8282
| [语义分割](./semantic_segmentation) | UNet | 65.00 | 64.93 | 15.29 | 10.23 | **1.49** | NVIDIA Tesla T4 |
8383
| NLP | PP-MiniLM | 72.81 | 72.44 | 128.01 | 17.97 | **7.12** | NVIDIA Tesla T4 |
8484
| NLP | ERNIE 3.0-Medium | 73.09 | 72.40 | 29.25(fp16) | 19.61 | **1.49** | NVIDIA Tesla T4 |
85-
| [目标检测](./pytorch_yolov5) | YOLOv5s<br/>(PyTorch) | 37.40 | 36.9 | 5.95 | 1.87 | **3.18** | NVIDIA Tesla T4 |
86-
| [目标检测](./pytorch_yolov6) | YOLOv6s<br/>(PyTorch) | 42.4 | 41.3 | 9.06 | 1.83 | **4.95** | NVIDIA Tesla T4 |
87-
| [目标检测](./pytorch_yolov7) | YOLOv7<br/>(PyTorch) | 51.1 | 50.8 | 26.84 | 4.55 | **5.89** | NVIDIA Tesla T4 |
88-
| [目标检测](./detection) | PP-YOLOE-l | 50.9 | 50.6 | 11.2 | 6.7 | **1.67** | NVIDIA Tesla V100 |
85+
| [目标检测](./pytorch_yolo_series) | YOLOv5s<br/>(PyTorch) | 37.40 | 36.9 | 5.95 | 1.87 | **3.18** | NVIDIA Tesla T4 |
86+
| [目标检测](./pytorch_yolo_series) | YOLOv6s<br/>(PyTorch) | 42.4 | 41.3 | 9.06 | 1.83 | **4.95** | NVIDIA Tesla T4 |
87+
| [目标检测](./pytorch_yolo_series) | YOLOv7<br/>(PyTorch) | 51.1 | 50.8 | 26.84 | 4.55 | **5.89** | NVIDIA Tesla T4 |
88+
| [目标检测](./detection) | PP-YOLOE-s | 43.1 | 42.6 | 6.51 | 2.12 | **3.07** | NVIDIA Tesla T4 |
8989
| [图像分类](./image_classification) | MobileNetV1<br/>(TensorFlow) | 71.0 | 70.22 | 30.45 | 15.86 | **1.92** | SDMM865(骁龙865) |
9090

9191
- 备注:目标检测精度指标为mAP(0.5:0.95)精度测量结果。图像分割精度指标为IoU精度测量结果。
9292
- 更多飞桨模型应用示例及Benchmark可以参考:[图像分类](./image_classification)[目标检测](./detection)[语义分割](./semantic_segmentation)[自然语言处理](./nlp)
93-
- 更多其它框架应用示例及Benchmark可以参考:[YOLOv5(PyTorch)](./pytorch_yolov5)[YOLOv6(PyTorch)](./pytorch_yolov6)[YOLOv7(PyTorch)](./pytorch_yolov7)[HuggingFace(PyTorch)](./pytorch_huggingface)[MobileNet(TensorFlow)](./tensorflow_mobilenet)
93+
- 更多其它框架应用示例及Benchmark可以参考:[YOLOv5(PyTorch)](./pytorch_yolo_series)[YOLOv6(PyTorch)](./pytorch_yolo_series)[YOLOv7(PyTorch)](./pytorch_yolo_series)[HuggingFace(PyTorch)](./pytorch_huggingface)[MobileNet(TensorFlow)](./tensorflow_mobilenet)
9494

9595
## **环境准备**
9696

example/auto_compression/detection/configs/ppyoloe_l_qat_dis.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ Distillation:
1212
loss: soft_label
1313

1414
Quantization:
15+
onnx_format: true
1516
use_pact: true
1617
activation_quantize_type: 'moving_average_abs_max'
1718
quantize_op_types:

0 commit comments

Comments
 (0)