PaddlePaddle
diff --git a/‎README.md
Lines changed: 32 additions & 20 deletions b/‎README.md
Lines changed: 32 additions & 20 deletions
diff --git a/‎demo/quant/pact_quant_aware/train.py
Lines changed: 14 additions & 16 deletions b/‎demo/quant/pact_quant_aware/train.py
Lines changed: 14 additions & 16 deletions
diff --git a/‎demo/quant/quant_aware/train.py
Lines changed: 3 additions & 4 deletions b/‎demo/quant/quant_aware/train.py
Lines changed: 3 additions & 4 deletions
diff --git a/‎demo/quant/quant_post/eval.py
Lines changed: 3 additions & 4 deletions b/‎demo/quant/quant_post/eval.py
Lines changed: 3 additions & 4 deletions
diff --git a/‎docs/zh_cn/api_cn/static/auto-compression/auto_compression_api.rst
Lines changed: 5 additions & 5 deletions b/‎docs/zh_cn/api_cn/static/auto-compression/auto_compression_api.rst
Lines changed: 5 additions & 5 deletions
diff --git a/‎docs/zh_cn/api_cn/static/quant/quantization_api.rst
Lines changed: 5 additions & 5 deletions b/‎docs/zh_cn/api_cn/static/quant/quantization_api.rst
Lines changed: 5 additions & 5 deletions
diff --git a/‎example/auto_compression/README.md
Lines changed: 5 additions & 5 deletions b/‎example/auto_compression/README.md
Lines changed: 5 additions & 5 deletions
diff --git a/‎example/auto_compression/detection/configs/ppyoloe_l_qat_dis.yaml
Lines changed: 1 addition & 0 deletions b/‎example/auto_compression/detection/configs/ppyoloe_l_qat_dis.yaml
Lines changed: 1 addition & 0 deletions
@@ -4,40 +4,51 @@
 </p>
 
 <p align="center">
-    <a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-blue.svg"></a>
-    <a href="https://paddleslim.readthedocs.io/en/latest/"><img src="https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat"></a>
-    <a href="https://paddleslim.readthedocs.io/zh_CN/latest/"><img src="https://img.shields.io/badge/中文文档-最新-brightgreen.svg"></a>
+    <a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-dfd.svg"></a>
+    <a href="https://github.com/PaddlePaddle/PaddleSlim/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/Paddle?color=ffa"></a>
+    <a href=""><img src="https://img.shields.io/badge/python-3.6.2+-aff.svg"></a>
+    <a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
+    <a href="https://github.com/PaddlePaddle/PaddleSlim/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/PaddleSlim?color=9ea"></a>
+    <a href="https://pypi.org/project/PaddleSlim/"><img src="https://img.shields.io/pypi/dm/PaddleSlim?color=9cf"></a>
+    <a href="https://github.com/PaddlePaddle/PaddleSlim/issues"><img src="https://img.shields.io/github/issues/PaddlePaddle/PaddleSlim?color=9cc"></a>
+    <a href="https://github.com/PaddlePaddle/PaddleSlim/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleSlim?color=ccf"></a>
 </p>
 
-PaddleSlim是一个专注于深度学习模型压缩的工具库，提供**低比特量化、知识蒸馏、稀疏化和模型结构搜索**等模型压缩策略，帮助用户快速实现模型的小型化。
+PaddleSlim是一个专注于深度学习模型压缩的工具库，提供**低比特量化、知识蒸馏、稀疏化和模型结构搜索**等模型压缩策略，帮助开发者快速实现模型的小型化。
 
 ## 产品动态
 
+- 🔥 **2022.08.16：自动化压缩功能升级**
+  - 支持直接加载ONNX模型和Paddle模型导出至ONNX
+  - 发布量化分析工具试用版，发布[YOLO系列离线量化工具](example/post_training_quantization/pytorch_yolo_series/)
+  - 更新[YOLO-Series自动化压缩模型库](example/auto_compression/pytorch_yolo_series)
+
+  | 模型  | Base mAP<sup>val<br>0.5:0.95 | ACT量化mAP<sup>val<br>0.5:0.95  | 模型体积压缩比 | 预测时延<sup><small>FP32</small><sup><br><sup>  | 预测时延<sup><small>INT8</small><sup><br><sup> | 预测加速比 |
+  | :-------- |:-------- |:--------: | :--------: | :---------------------: | :----------------: | :----------------: |
+  | PPYOLOE-s | 43.1 | 42.6  | 3.9倍  | 6.51ms  | 2.12ms  | 3.1倍 |
+  | YOLOv5s | 37.4   | 36.9  | 3.8倍  | 5.95ms  |  1.87ms | 3.2倍 |
+  | YOLOv6s | 42.4   | 41.3 | 3.9倍 |  9.06ms  |   1.83ms   | 5.0倍   |
+  | YOLOv7 |  51.1   | 50.9 | 3.9倍 |  26.84ms  |   4.55ms   |  5.9倍  |
+  | YOLOv7-Tiny | 37.3   | 37.0 | 3.9倍 | 5.06ms  |   1.68ms   |  3.0倍  |
+
+
 - 🔥 **2022.07.01: 发布[v2.3.0版本](https://github.com/PaddlePaddle/PaddleSlim/releases/tag/v2.3.0)**
 
   - 发布[自动化压缩功能](example/auto_compression)
-
-    - 支持代码无感知压缩：用户只需提供推理模型文件和数据，既可进行离线量化（PTQ）、量化训练（QAT）、稀疏训练等压缩任务。
+    - 支持代码无感知压缩：开发者只需提供推理模型文件和数据，既可进行离线量化（PTQ）、量化训练（QAT）、稀疏训练等压缩任务。
     - 支持自动策略选择，根据任务特点和部署环境特性：自动搜索合适的离线量化方法,自动搜索最佳的压缩策略组合方式。
     - 发布[自然语言处理](example/auto_compression/nlp)、[图像语义分割](example/auto_compression/semantic_segmentation)、[图像目标检测](example/auto_compression/detection)三个方向的自动化压缩示例。
-    - 发布`X2Paddle`模型自动化压缩方案:[YOLOv5](example/auto_compression/pytorch_yolov5)、[YOLOv6](example/auto_compression/pytorch_yolov6)、[YOLOv7](example/auto_compression/pytorch_yolov7)、[HuggingFace](example/auto_compression/pytorch_huggingface)、[MobileNet](example/auto_compression/tensorflow_mobilenet)。
-
+    - 发布`X2Paddle`模型自动化压缩方案:[YOLOv5](example/auto_compression/pytorch_yolo_series)、[YOLOv6](example/auto_compression/pytorch_yolo_series)、[YOLOv7](example/auto_compression/pytorch_yolo_series)、[HuggingFace](example/auto_compression/pytorch_huggingface)、[MobileNet](example/auto_compression/tensorflow_mobilenet)。
   - 升级量化功能
-
-    - 统一量化模型格式
-    - 离线量化支持while op
-    - 新增7种[离线量化方法](docs/zh_cn/tutorials/quant/post_training_quantization.md), 包括HIST, AVG, EMD, Bias Correction, AdaRound等
-    - 修复BERT大模型量化训练过慢的问题
-
+    - 统一量化模型格式；离线量化支持while op；修复BERT大模型量化训练过慢的问题。
+    - 新增7种[离线量化方法](docs/zh_cn/tutorials/quant/post_training_quantization.md), 包括HIST, AVG, EMD, Bias Correction, AdaRound等。
   - 支持半结构化稀疏训练
-
   - 新增延时预估工具
+    - 支持对稀疏化模型、低比特量化模型的性能预估；支持预估指定模型在特定部署环境下 (ARM CPU + Paddle Lite) 的推理性能；提供 SD625、SD710、RK3288 芯片 + Paddle Lite 的预估接口。
+    - 提供部署环境自动扩展工具，可以自动增加在更多 ARM CPU 设备上的预估工具。
 
-    - 支持预估指定模型在特定部署环境下 (ARM CPU + Paddle Lite) 的推理性能
-    - 提供部署环境自动扩展工具，可以自动增加在更多 ARM CPU 设备上的预估工具
-    - 支持对稀疏化模型、低比特量化模型的性能预估
-    - 提供 SD625、SD710、RK3288 芯片 + Paddle Lite 的预估接口
-
+<details>
+<summary>历史更新</summary>
 
 - **2021.11.15: 发布v2.2.0版本**
 
@@ -52,6 +63,7 @@ PaddleSlim是一个专注于深度学习模型压缩的工具库，提供**低
 
 更多信息请参考：[release note](https://github.com/PaddlePaddle/PaddleSlim/releases)
 
+</details>
 
 ## 基础压缩功能概览
 
 
@@ -65,6 +65,8 @@
         "Whether to use PACT or not.")
 add_arg('analysis',          bool, False,
         "Whether analysis variables distribution.")
+add_arg('onnx_format',          bool, False,
+        "Whether use onnx format or not.")
 add_arg('ce_test',                 bool,   False,       "Whether to CE test.")
 
 # yapf: enable
@@ -257,6 +259,8 @@ def compress(args):
         'window_size': 10000,
         # The decay coefficient of moving average, default is 0.9
         'moving_rate': 0.9,
+        # Whether use onnx format or not
+        'onnx_format': args.onnx_format,
     }
 
     # 2. quantization transform programs (training aware)
@@ -298,9 +302,9 @@ def get_optimizer():
         places,
         quant_config,
         scope=None,
-        act_preprocess_func=act_preprocess_func,
-        optimizer_func=optimizer_func,
-        executor=executor,
+        act_preprocess_func=None,
+        optimizer_func=None,
+        executor=None,
         for_test=True)
     compiled_train_prog = quant_aware(
         train_prog,
@@ -425,29 +429,23 @@ def train(epoch, compiled_train_prog, lr):
     # 3. Freeze the graph after training by adjusting the quantize
     #    operators' order for the inference.
     #    The dtype of float_program's weights is float32, but in int8 range.
-    float_program, int8_program = convert(val_program, places, quant_config, \
-                                                        scope=None, \
-                                                        save_int8=True)
+    model_path = os.path.join(quantization_model_save_dir, args.model)
+    if not os.path.isdir(model_path):
+        os.makedirs(model_path)
+    float_program = convert(val_program, places, quant_config)
     _logger.info("eval best_model after convert")
     final_acc1 = test(best_epoch, float_program)
     _logger.info("final acc:{}".format(final_acc1))
 
     # 4. Save inference model
-    model_path = os.path.join(quantization_model_save_dir, args.model,
-                              'act_' + quant_config['activation_quantize_type']
-                              + '_w_' + quant_config['weight_quantize_type'])
-    float_path = os.path.join(model_path, 'float')
-    if not os.path.isdir(model_path):
-        os.makedirs(model_path)
-
     paddle.fluid.io.save_inference_model(
-        dirname=float_path,
+        dirname=model_path,
         feeded_var_names=[image.name],
         target_vars=[out],
         executor=exe,
         main_program=float_program,
-        model_filename=float_path + '/model',
-        params_filename=float_path + '/params')
+        model_filename=model_path + '/model.pdmodel',
+        params_filename=model_path + '/model.pdiparams')
 
 
 def main():
 
@@ -126,6 +126,8 @@ def compress(args):
         'window_size': 10000,
         # The decay coefficient of moving average, default is 0.9
         'moving_rate': 0.9,
+        # Whether use onnx format or not
+        'onnx_format': args.onnx_format,
     }
 
     pretrain = True
@@ -294,10 +296,7 @@ def train(epoch, compiled_train_prog):
     #    operators' order for the inference.
     #    The dtype of float_program's weights is float32, but in int8 range.
     ############################################################################################################
-    float_program, int8_program = convert(val_program, places, quant_config, \
-                                                        scope=None, \
-                                                        save_int8=True,
-                                                        onnx_format=args.onnx_format)
+    float_program = convert(val_program, places, quant_config)
     print("eval best_model after convert")
     final_acc1 = test(best_epoch, float_program)
     ############################################################################################################
 
@@ -21,8 +21,7 @@
 import paddle
 sys.path[0] = os.path.join(
     os.path.dirname("__file__"), os.path.pardir, os.path.pardir)
-sys.path[1] = os.path.join(
-    os.path.dirname("__file__"), os.path.pardir)
+sys.path[1] = os.path.join(os.path.dirname("__file__"), os.path.pardir)
 import imagenet_reader as reader
 from utility import add_arguments, print_arguments
 
@@ -31,8 +30,8 @@
 add_arg = functools.partial(add_arguments, argparser=parser)
 add_arg('use_gpu',          bool, True,                 "Whether to use GPU or not.")
 add_arg('model_path', str,  "./pruning/checkpoints/resnet50/2/eval_model/",                 "Whether to use pretrained model.")
-add_arg('model_name', str,  '__model__', "model filename for inference model")
-add_arg('params_name', str, '__params__', "params filename for inference model")
+add_arg('model_name', str,  'model.pdmodel', "model filename for inference model")
+add_arg('params_name', str, 'model.pdiparams', "params filename for inference model")
 add_arg('batch_size',       int,  64,                 "Minibatch size.")
 # yapf: enable
 
 
@@ -3,19 +3,19 @@ AutoCompression自动压缩功能
 
 AutoCompression
 ---------------
-.. py:class:: paddleslim.auto_compression.AutoCompression(model_dir, model_filename, params_filename, save_dir, strategy_config, train_config, train_dataloader, eval_callback, devices='gpu')
+.. py:class:: paddleslim.auto_compression.AutoCompression(model_dir, train_dataloader, model_filename, params_filename, save_dir, strategy_config, train_config, eval_callback, devices='gpu')
 
-`源代码 <https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/auto_compression.py#L32>`_
+`源代码 <https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/auto_compression.py#L49>`_
 
 根据指定的配置对使用 ``paddle.jit.save`` 接口或者 ``paddle.static.save_inference_model`` 接口保存的推理模型进行压缩。
 
 **参数: **
 
 - **model_dir(str)** - 需要压缩的推理模型所在的目录。
+- **train_dataloader(paddle.io.DataLoader)** - 训练数据迭代器。注意：如果选择离线量化超参搜索策略的话, ``train_dataloader`` 和 ``eval_callback`` 设置相同的数据读取即可。
 - **model_filename(str)** - 需要压缩的推理模型文件名称。
 - **params_filename(str)** - 需要压缩的推理模型参数文件名称。
 - **save_dir(str)** - 压缩后模型的所保存的目录。
-- **train_dataloader(paddle.io.DataLoader)** - 训练数据迭代器。注意：如果选择离线量化超参搜索策略的话, ``train_dataloader`` 和 ``eval_callback`` 设置相同的数据读取即可。
 - **train_config(dict)** - 训练配置。可以配置的参数请参考: `<https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L103>`_ 。注意：如果选择离线量化超参搜索策略的话， ``train_config`` 直接设置为 ``None`` 即可。
 - **strategy_config(dict, list(dict), 可选)** - 使用的压缩策略，可以通过设置多个单种策略来并行使用这些压缩方式。字典的关键字必须在: 
              ``Quantization`` (量化配置, 可配置的参数参考 `<https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L24>`_ ), 
@@ -82,13 +82,13 @@ AutoCompression
 
    eval_dataloader = Cifar10(mode='eval')
 
-   ac = AutoCompression(model_path, model_filename, params_filename, save_dir, \
+   ac = AutoCompression(model_path, train_dataloader, model_filename, params_filename, save_dir, \
 
                         strategy_config="Quantization": Quantization(**default_ptq_config), 
 
                         "Distillation": HyperParameterOptimization(**default_distill_config)}, \
 
-                        train_config=None, train_dataloader=train_dataloader, eval_callback=eval_dataloader,devices='gpu')
+                        train_config=None, eval_callback=eval_dataloader,devices='gpu')
 
 ```
 
 
@@ -118,7 +118,7 @@ quant_post_dynamic
 quant_post_static
 ---------------
 
-.. py:function:: paddleslim.quant.quant_post_static(executor,model_dir, quantize_model_path, batch_generator=None, sample_generator=None, model_filename=None, params_filename=None, save_model_filename='__model__', save_params_filename='__params__', batch_size=16, batch_nums=None, scope=None, algo='KL', round_type='round', quantizable_op_type=["conv2d","depthwise_conv2d","mul"], is_full_quantize=False, weight_bits=8, activation_bits=8, activation_quantize_type='range_abs_max', weight_quantize_type='channel_wise_abs_max', onnx_format=False, skip_tensor_list=None, optimize_model=False)
+.. py:function:: paddleslim.quant.quant_post_static(executor,model_dir, quantize_model_path, batch_generator=None, sample_generator=None, model_filename=None, params_filename=None, save_model_filename='model.pdmodel', save_params_filename='model.pdiparams', batch_size=16, batch_nums=None, scope=None, algo='KL', round_type='round', quantizable_op_type=["conv2d","depthwise_conv2d","mul"], is_full_quantize=False, weight_bits=8, activation_bits=8, activation_quantize_type='range_abs_max', weight_quantize_type='channel_wise_abs_max', onnx_format=False, skip_tensor_list=None, optimize_model=False)
 
 `源代码 <https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/quant/quanter.py>`_
 
@@ -217,15 +217,15 @@ quant_post_static
         target_vars=[out],
         main_program=val_prog,
         executor=exe,
-        model_filename='__model__',
-        params_filename='__params__')
+        model_filename='model.pdmodel',
+        params_filename='model.pdiparams')
     quant_post_static(
         executor=exe,
         model_dir='./model_path',
         quantize_model_path='./save_path',
         sample_generator=val_reader,
-        model_filename='__model__',
-        params_filename='__params__',
+        model_filename='model.pdmodel',
+        params_filename='model.pdiparams',
         batch_size=16,
         batch_nums=10)
 
 
@@ -82,15 +82,15 @@ ACT相比传统的模型压缩方法，
 | [语义分割](./semantic_segmentation) | UNet                         | 65.00                  | 64.93                  | 15.29            | 10.23            | **1.49**   | NVIDIA Tesla T4   |
 | NLP                             | PP-MiniLM                    | 72.81                 | 72.44                 | 128.01           | 17.97            | **7.12**   | NVIDIA Tesla T4   |
 | NLP                             | ERNIE 3.0-Medium             | 73.09                 | 72.40                 | 29.25(fp16)      | 19.61            | **1.49**   | NVIDIA Tesla T4   |
-| [目标检测](./pytorch_yolov5)             | YOLOv5s<br/>(PyTorch)        | 37.40                  | 36.9                   | 5.95             | 1.87             | **3.18**   | NVIDIA Tesla T4   |
-| [目标检测](./pytorch_yolov6)             | YOLOv6s<br/>(PyTorch)        | 42.4                  | 41.3                   | 9.06             | 1.83             | **4.95**   | NVIDIA Tesla T4   |
-| [目标检测](./pytorch_yolov7)             | YOLOv7<br/>(PyTorch)        | 51.1                  | 50.8                   | 26.84             | 4.55             | **5.89**   | NVIDIA Tesla T4   |
-| [目标检测](./detection)             | PP-YOLOE-l                   | 50.9                   | 50.6                   | 11.2             | 6.7              | **1.67**   | NVIDIA Tesla V100 |
+| [目标检测](./pytorch_yolo_series)             | YOLOv5s<br/>(PyTorch)        | 37.40                  | 36.9                   | 5.95             | 1.87             | **3.18**   | NVIDIA Tesla T4   |
+| [目标检测](./pytorch_yolo_series)             | YOLOv6s<br/>(PyTorch)        | 42.4                  | 41.3                   | 9.06             | 1.83             | **4.95**   | NVIDIA Tesla T4   |
+| [目标检测](./pytorch_yolo_series)             | YOLOv7<br/>(PyTorch)        | 51.1                  | 50.8                   | 26.84             | 4.55             | **5.89**   | NVIDIA Tesla T4   |
+| [目标检测](./detection)             | PP-YOLOE-s                   | 43.1                   | 42.6                   |  6.51  |   2.12   |  **3.07**  | NVIDIA Tesla T4 |
 | [图像分类](./image_classification)  | MobileNetV1<br/>(TensorFlow) | 71.0                   | 70.22                  | 30.45            | 15.86            |  **1.92**  | SDMM865（骁龙865）     |  
 
 - 备注：目标检测精度指标为mAP（0.5:0.95）精度测量结果。图像分割精度指标为IoU精度测量结果。
 - 更多飞桨模型应用示例及Benchmark可以参考：[图像分类](./image_classification)，[目标检测](./detection)，[语义分割](./semantic_segmentation)，[自然语言处理](./nlp)
-- 更多其它框架应用示例及Benchmark可以参考：[YOLOv5(PyTorch)](./pytorch_yolov5)，[YOLOv6(PyTorch)](./pytorch_yolov6)，[YOLOv7(PyTorch)](./pytorch_yolov7)，[HuggingFace(PyTorch)](./pytorch_huggingface)，[MobileNet(TensorFlow)](./tensorflow_mobilenet)。
+- 更多其它框架应用示例及Benchmark可以参考：[YOLOv5(PyTorch)](./pytorch_yolo_series)，[YOLOv6(PyTorch)](./pytorch_yolo_series)，[YOLOv7(PyTorch)](./pytorch_yolo_series)，[HuggingFace(PyTorch)](./pytorch_huggingface)，[MobileNet(TensorFlow)](./tensorflow_mobilenet)。
 
 ## **环境准备**
 
 
@@ -12,6 +12,7 @@ Distillation:
   loss: soft_label
 
 Quantization:
+  onnx_format: true
   use_pact: true
   activation_quantize_type: 'moving_average_abs_max'
   quantize_op_types: