From f62f49d8bd5a15bc7a3dbf2686187f212bd732e7 Mon Sep 17 00:00:00 2001
From: gaotingquan <gaotingquan@baidu.com>
Date: Mon, 24 Oct 2022 13:01:52 +0000
Subject: [PATCH 1/9] docs: fix invalid links

---
 docs/en/PULC/PULC_car_exists_en.md                        | 2 +-
 docs/en/PULC/PULC_language_classification_en.md           | 2 +-
 docs/en/PULC/PULC_person_attribute_en.md                  | 2 +-
 docs/en/PULC/PULC_person_exists_en.md                     | 2 +-
 docs/en/PULC/PULC_safety_helmet_en.md                     | 2 +-
 docs/en/PULC/PULC_text_image_orientation_en.md            | 2 +-
 docs/en/PULC/PULC_textline_orientation_en.md              | 4 ++--
 docs/en/PULC/PULC_traffic_sign_en.md                      | 2 +-
 docs/en/PULC/PULC_vehicle_attribute_en.md                 | 2 +-
 docs/en/algorithm_introduction/ISE_ReID_en.md             | 2 +-
 docs/en/algorithm_introduction/reid.md                    | 2 +-
 .../image_recognition_pipeline/feature_extraction_en.md   | 4 ++--
 .../classification_serving_deploy_en.md                   | 2 +-
 docs/en/inference_deployment/export_model_en.md           | 2 +-
 .../inference_deployment/paddle_hub_serving_deploy_en.md  | 4 ++--
 .../inference_deployment/recognition_serving_deploy_en.md | 4 ++--
 docs/en/quick_start/quick_start_recognition_en.md         | 8 ++++----
 17 files changed, 24 insertions(+), 24 deletions(-)
diff --git a/docs/en/PULC/PULC_car_exists_en.md b/docs/en/PULC/PULC_car_exists_en.md
index 33c0932e6f..91cc3733a0 100644
--- a/docs/en/PULC/PULC_car_exists_en.md
+++ b/docs/en/PULC/PULC_car_exists_en.md
@@ -438,7 +438,7 @@ PaddleClas provides an example about how to deploy with C++. Please refer to [De
 
 Paddle Serving is a flexible, high-performance carrier for machine learning models, and supports different protocol, such as RESTful, gRPC, bRPC and so on, which provides different deployment solutions for a variety of heterogeneous hardware and operating system environments. Please refer [Paddle Serving](https://github.com/PaddlePaddle/Serving) for more information.
 
-PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/paddle_serving_deploy_en.md).
+PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/classification_serving_deploy_en.md).
 
 <a name="6.5"></a>
 
diff --git a/docs/en/PULC/PULC_language_classification_en.md b/docs/en/PULC/PULC_language_classification_en.md
index c7cd5f5db9..450f3e7562 100644
--- a/docs/en/PULC/PULC_language_classification_en.md
+++ b/docs/en/PULC/PULC_language_classification_en.md
@@ -451,7 +451,7 @@ PaddleClas provides an example about how to deploy with C++. Please refer to [De
 
 Paddle Serving is a flexible, high-performance carrier for machine learning models, and supports different protocol, such as RESTful, gRPC, bRPC and so on, which provides different deployment solutions for a variety of heterogeneous hardware and operating system environments. Please refer [Paddle Serving](https://github.com/PaddlePaddle/Serving) for more information.
 
-PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/paddle_serving_deploy_en.md).
+PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/classification_serving_deploy_en.md).
 
 <a name="6.5"></a>
 
diff --git a/docs/en/PULC/PULC_person_attribute_en.md b/docs/en/PULC/PULC_person_attribute_en.md
index 07981781e0..3d6f70fcab 100644
--- a/docs/en/PULC/PULC_person_attribute_en.md
+++ b/docs/en/PULC/PULC_person_attribute_en.md
@@ -450,7 +450,7 @@ PaddleClas provides an example about how to deploy with C++. Please refer to [De
 
 Paddle Serving is a flexible, high-performance carrier for machine learning models, and supports different protocol, such as RESTful, gRPC, bRPC and so on, which provides different deployment solutions for a variety of heterogeneous hardware and operating system environments. Please refer [Paddle Serving](https://github.com/PaddlePaddle/Serving) for more information.
 
-PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/paddle_serving_deploy_en.md).
+PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/classification_serving_deploy_en.md).
 
 <a name="6.5"></a>
 
diff --git a/docs/en/PULC/PULC_person_exists_en.md b/docs/en/PULC/PULC_person_exists_en.md
index baf5ce3e4c..31d452fd76 100644
--- a/docs/en/PULC/PULC_person_exists_en.md
+++ b/docs/en/PULC/PULC_person_exists_en.md
@@ -439,7 +439,7 @@ PaddleClas provides an example about how to deploy with C++. Please refer to [De
 
 Paddle Serving is a flexible, high-performance carrier for machine learning models, and supports different protocol, such as RESTful, gRPC, bRPC and so on, which provides different deployment solutions for a variety of heterogeneous hardware and operating system environments. Please refer [Paddle Serving](https://github.com/PaddlePaddle/Serving) for more information.
 
-PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/paddle_serving_deploy_en.md).
+PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/classification_serving_deploy_en.md).
 
 <a name="6.5"></a>
 
diff --git a/docs/en/PULC/PULC_safety_helmet_en.md b/docs/en/PULC/PULC_safety_helmet_en.md
index d2e5cb3293..45bf57ecad 100644
--- a/docs/en/PULC/PULC_safety_helmet_en.md
+++ b/docs/en/PULC/PULC_safety_helmet_en.md
@@ -413,7 +413,7 @@ PaddleClas provides an example about how to deploy with C++. Please refer to [De
 
 Paddle Serving is a flexible, high-performance carrier for machine learning models, and supports different protocol, such as RESTful, gRPC, bRPC and so on, which provides different deployment solutions for a variety of heterogeneous hardware and operating system environments. Please refer [Paddle Serving](https://github.com/PaddlePaddle/Serving) for more information.
 
-PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/paddle_serving_deploy_en.md).
+PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/classification_serving_deploy_en.md).
 
 <a name="6.5"></a>
 
diff --git a/docs/en/PULC/PULC_text_image_orientation_en.md b/docs/en/PULC/PULC_text_image_orientation_en.md
index 1d3cc41f99..cf530905d2 100644
--- a/docs/en/PULC/PULC_text_image_orientation_en.md
+++ b/docs/en/PULC/PULC_text_image_orientation_en.md
@@ -447,7 +447,7 @@ PaddleClas provides an example about how to deploy with C++. Please refer to [De
 
 Paddle Serving is a flexible, high-performance carrier for machine learning models, and supports different protocol, such as RESTful, gRPC, bRPC and so on, which provides different deployment solutions for a variety of heterogeneous hardware and operating system environments. Please refer [Paddle Serving](https://github.com/PaddlePaddle/Serving) for more information.
 
-PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/paddle_serving_deploy_en.md).
+PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/classification_serving_deploy_en.md).
 
 <a name="6.5"></a>
 
diff --git a/docs/en/PULC/PULC_textline_orientation_en.md b/docs/en/PULC/PULC_textline_orientation_en.md
index d11307d0b5..fc4d540bc0 100644
--- a/docs/en/PULC/PULC_textline_orientation_en.md
+++ b/docs/en/PULC/PULC_textline_orientation_en.md
@@ -56,7 +56,7 @@ It can be seen that high accuracy can be getted when backbone is SwinTranformer_
 
 **Note**:
 * Backbone name without \* means the resolution is 224x224, and with \* means the resolution is 48x192 (h\*w). The stride of the network is changed to `[2, [2, 1], [2, 1], [2, 1]`. Please refer to [PaddleOCR]（ https://github.com/PaddlePaddle/PaddleOCR）for more details.
-* Backbone name with \*\* means that the resolution is 80x160 (h\*w), and the stride of the network is changed to `[2, [2, 1], [2, 1], [2, 1]]`. This resolution is searched by [Hyperparameter Searching](pulc_train_en.md#4).
+* Backbone name with \*\* means that the resolution is 80x160 (h\*w), and the stride of the network is changed to `[2, [2, 1], [2, 1], [2, 1]]`. This resolution is searched by [Hyperparameter Searching](PULC_train_en.md#4).
 * The Latency is tested on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz. The MKLDNN is enabled and the number of threads is 10.
 * About PP-LCNet, please refer to [PP-LCNet Introduction](../models/PP-LCNet_en.md) and [PP-LCNet Paper](https://arxiv.org/abs/2109.15099).
 
@@ -431,7 +431,7 @@ PaddleClas provides an example about how to deploy with C++. Please refer to [De
 
 Paddle Serving is a flexible, high-performance carrier for machine learning models, and supports different protocol, such as RESTful, gRPC, bRPC and so on, which provides different deployment solutions for a variety of heterogeneous hardware and operating system environments. Please refer [Paddle Serving](https://github.com/PaddlePaddle/Serving) for more information.
 
-PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/paddle_serving_deploy_en.md).
+PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/classification_serving_deploy_en.md).
 
 <a name="6.5"></a>
 
diff --git a/docs/en/PULC/PULC_traffic_sign_en.md b/docs/en/PULC/PULC_traffic_sign_en.md
index baa0faf482..e235ca9272 100644
--- a/docs/en/PULC/PULC_traffic_sign_en.md
+++ b/docs/en/PULC/PULC_traffic_sign_en.md
@@ -456,7 +456,7 @@ PaddleClas provides an example about how to deploy with C++. Please refer to [De
 
 Paddle Serving is a flexible, high-performance carrier for machine learning models, and supports different protocol, such as RESTful, gRPC, bRPC and so on, which provides different deployment solutions for a variety of heterogeneous hardware and operating system environments. Please refer [Paddle Serving](https://github.com/PaddlePaddle/Serving) for more information.
 
-PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/paddle_serving_deploy_en.md).
+PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/classification_serving_deploy_en.md).
 
 <a name="6.5"></a>
 
diff --git a/docs/en/PULC/PULC_vehicle_attribute_en.md b/docs/en/PULC/PULC_vehicle_attribute_en.md
index 2a16526557..b831e71002 100644
--- a/docs/en/PULC/PULC_vehicle_attribute_en.md
+++ b/docs/en/PULC/PULC_vehicle_attribute_en.md
@@ -464,7 +464,7 @@ PaddleClas provides an example about how to deploy with C++. Please refer to [De
 
 Paddle Serving is a flexible, high-performance carrier for machine learning models, and supports different protocol, such as RESTful, gRPC, bRPC and so on, which provides different deployment solutions for a variety of heterogeneous hardware and operating system environments. Please refer [Paddle Serving](https://github.com/PaddlePaddle/Serving) for more information.
 
-PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/paddle_serving_deploy_en.md).
+PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/classification_serving_deploy_en.md).
 
 <a name="6.5"></a>
 
diff --git a/docs/en/algorithm_introduction/ISE_ReID_en.md b/docs/en/algorithm_introduction/ISE_ReID_en.md
index e509a52018..a5b782eead 100644
--- a/docs/en/algorithm_introduction/ISE_ReID_en.md
+++ b/docs/en/algorithm_introduction/ISE_ReID_en.md
@@ -16,7 +16,7 @@ ISE (Implicit Sample Extension) is a simple, efficient, and effective learning a
 > Xinyu Zhang, Dongdong Li, Zhigang Wang, Jian Wang, Errui Ding, Javen Qinfeng Shi, Zhaoxiang Zhang, Jingdong Wang<br>
 > CVPR2022
 
-![image](../../images/ISE_ReID/ISE_pipeline.png)
+![image](../../images/ISE_pipeline.png)
 
 <a name='2'></a>
 ## 2. Performance on Market1501 and MSMT17
diff --git a/docs/en/algorithm_introduction/reid.md b/docs/en/algorithm_introduction/reid.md
index 5471fa3b34..12da4d2e7c 100644
--- a/docs/en/algorithm_introduction/reid.md
+++ b/docs/en/algorithm_introduction/reid.md
@@ -1,4 +1,4 @@
-English | [简体中文](../../zh_CN/algorithm_introduction/reid.md)
+English | [简体中文](../../zh_CN/algorithm_introduction/ReID.md)
 
 # ReID pedestrian re-identification
 
diff --git a/docs/en/image_recognition_pipeline/feature_extraction_en.md b/docs/en/image_recognition_pipeline/feature_extraction_en.md
index 856d475372..a3809b8afe 100644
--- a/docs/en/image_recognition_pipeline/feature_extraction_en.md
+++ b/docs/en/image_recognition_pipeline/feature_extraction_en.md
@@ -112,7 +112,7 @@ Based on the `GeneralRecognitionV2_PPLCNetV2_base.yaml` configuration file, the
 
 ### 5.1 Data Preparation
 
-First you need to customize your own dataset based on the task. Please refer to [Dataset Format Description](../data_preparation/recognition_dataset.md) for the dataset format and file structure.
+First you need to customize your own dataset based on the task. Please refer to [Dataset Format Description](../data_preparation/recognition_dataset_en.md) for the dataset format and file structure.
 
 After the preparation is complete, it is necessary to modify the content related to the data configuration in the configuration file, mainly including the path of the dataset and the number of categories. As is as shown below:
 
@@ -256,7 +256,7 @@ wangzai.jpg: [-7.82453567e-02 2.55877394e-02 -3.66694555e-02 1.34572461e-02
  -3.40284109e-02 8.35561901e-02 2.10910216e-02 -3.27066667e-02]
 ```
 
-In most cases, just getting the features may not meet the users' requirements. If you want to go further on the image recognition task, you can refer to the document [Vector Search](./vector_search.md).
+In most cases, just getting the features may not meet the users' requirements. If you want to go further on the image recognition task, you can refer to the document [Vector Search](./vector_search_en.md).
 
 <a name="6"></a>
 
diff --git a/docs/en/inference_deployment/classification_serving_deploy_en.md b/docs/en/inference_deployment/classification_serving_deploy_en.md
index a46b000702..86a774f0eb 100644
--- a/docs/en/inference_deployment/classification_serving_deploy_en.md
+++ b/docs/en/inference_deployment/classification_serving_deploy_en.md
@@ -1,4 +1,4 @@
-English | [简体中文](../../zh_CN/inference_deployment/classification_serving_deploy.md)
+English | [简体中文](../../zh_CN/deployment/image_classification/paddle_serving.md)
 
 # Classification model service deployment
 
diff --git a/docs/en/inference_deployment/export_model_en.md b/docs/en/inference_deployment/export_model_en.md
index 8fe3c46ad9..062ee1bcec 100644
--- a/docs/en/inference_deployment/export_model_en.md
+++ b/docs/en/inference_deployment/export_model_en.md
@@ -99,5 +99,5 @@ The inference model exported is used to deployment by using prediction engine. Y
 * [C++ inference](./cpp_deploy_en.md)(Only support classification)
 * [Python Whl inference](./whl_deploy_en.md)(Only support classification)
 * [PaddleHub Serving inference](./paddle_hub_serving_deploy_en.md)(Only support classification)
-* [PaddleServing inference](./paddle_serving_deploy_en.md)
+* [PaddleServing inference](./classification_serving_deploy_en.md)
 * [PaddleLite inference](./paddle_lite_deploy_en.md)(Only support classification)
diff --git a/docs/en/inference_deployment/paddle_hub_serving_deploy_en.md b/docs/en/inference_deployment/paddle_hub_serving_deploy_en.md
index 4dddc94bd8..213b5822e0 100644
--- a/docs/en/inference_deployment/paddle_hub_serving_deploy_en.md
+++ b/docs/en/inference_deployment/paddle_hub_serving_deploy_en.md
@@ -1,4 +1,4 @@
-English | [简体中文](../../zh_CN/inference_deployment/paddle_hub_serving_deploy.md)
+English | [简体中文](../../zh_CN/deployment/image_classification/paddle_hub.md)
 
 # Service deployment based on PaddleHub Serving
 
@@ -53,7 +53,7 @@ Before installing the service module, you need to prepare the inference model an
   "inference_model_dir": "../inference/"
   ```
 * Model files (including `.pdmodel` and `.pdiparams`) must be named `inference`.
-* We provide a large number of pre-trained models based on the ImageNet-1k dataset. For the model list and download address, see [Model Library Overview](../algorithm_introduction/ImageNet_models.md), or you can use your own trained and converted models.
+* We provide a large number of pre-trained models based on the ImageNet-1k dataset. For the model list and download address, see [Model Library Overview](../algorithm_introduction/ImageNet_models_en.md), or you can use your own trained and converted models.
 
 
 <a name="4"></a>
diff --git a/docs/en/inference_deployment/recognition_serving_deploy_en.md b/docs/en/inference_deployment/recognition_serving_deploy_en.md
index 6d1db098d9..b21b22d3ea 100644
--- a/docs/en/inference_deployment/recognition_serving_deploy_en.md
+++ b/docs/en/inference_deployment/recognition_serving_deploy_en.md
@@ -1,4 +1,4 @@
-English | [简体中文](../../zh_CN/inference_deployment/recognition_serving_deploy.md)
+English | [简体中文](../../zh_CN/deployment/PP-ShiTu/paddle_serving.md)
 
 # Recognition model service deployment
 
@@ -219,7 +219,7 @@ Different from Python Serving, the C++ Serving client calls C++ OP to predict, s
   # One-click compile and install Serving server, set SERVING_BIN
   source ./build_server.sh python3.7
   ```
-  **Note:** The path set by [build_server.sh](../build_server.sh#L55-L62) may need to be modified according to the actual machine environment such as CUDA, python version, etc., and then compiled; If you encounter a non-network error during the execution of `build_server.sh`, you can manually copy the commands in the script to the terminal for execution.
+  **Note:** The path set by [build_server.sh](../../../deploy/paddleserving/build_server.sh#L55-L62) may need to be modified according to the actual machine environment such as CUDA, python version, etc., and then compiled; If you encounter a non-network error during the execution of `build_server.sh`, you can manually copy the commands in the script to the terminal for execution.
 
 - The input and output format used by C++ Serving is different from that of Python, so you need to execute the following command to overwrite the files below [3.1] (#31-model conversion) by copying the 4 files to get the corresponding 4 prototxt files in the folder.
   ```shell
diff --git a/docs/en/quick_start/quick_start_recognition_en.md b/docs/en/quick_start/quick_start_recognition_en.md
index 670ad03e80..1d93728de0 100644
--- a/docs/en/quick_start/quick_start_recognition_en.md
+++ b/docs/en/quick_start/quick_start_recognition_en.md
@@ -71,7 +71,7 @@ Click the "save index" button above <img src="../../images/quick_start/android_d
 Click the "initialize index" button above <img src="../../images/quick_start/android_demo/reset_100.png" width="25" height="25"/> to initialize the current library to `original`.
 
 #### 1.2.5 Preview Index
-Click the "class preview" button <img src="../../images/quick_start/android_demo/leibichaxun_100.png" width="25" height="25"/> to view it in the pop-up window.
+Click the "class preview" button <img src="../../images/quick_start/android_demo/leibiechaxun_100.png" width="25" height="25"/> to view it in the pop-up window.
 
 <a name="Feature introduction"></a>
 
@@ -99,7 +99,7 @@ One can preview it according to the instructions in [Function Experience - Previ
 
 ### 2.1 Environment configuration
 
-* Installation: Please refer to the document [Environment Preparation](../installation/install_paddleclas.md) to configure the PaddleClas operating environment.
+* Installation: Please refer to the document [Environment Preparation](../installation/install_paddleclas_en.md) to configure the PaddleClas operating environment.
 
 * Go to the `deploy` run directory. All the content and scripts in this section need to be run in the `deploy` directory, you can enter the `deploy` directory with the following scripts.
 
@@ -315,7 +315,7 @@ Build a new index database `index_all` with the following scripts.
 python3.7 python/build_gallery.py -c configs/inference_general.yaml -o IndexProcess.data_file="./drink_dataset_v2.0/gallery/drink_label_all.txt" -o IndexProcess.index_dir="./drink_dataset_v2.0/index_all"
 ```
 
-The final constructed new index database is saved in the folder `./drink_dataset_v2.0/index_all`. For specific instructions on yaml `yaml`, please refer to [Vector Search Documentation](../image_recognition_pipeline/vector_search.md).
+The final constructed new index database is saved in the folder `./drink_dataset_v2.0/index_all`. For specific instructions on yaml `yaml`, please refer to [Vector Search Documentation](../image_recognition_pipeline/vector_search_en.md).
 
 <a name="Image recognition based on the new index database"></a>
 
@@ -392,4 +392,4 @@ After decompression, the `recognition_demo_data_v1.1` folder should have the fol
 
 After downloading the model and test data according to the above steps, you can re-build the index database and test the relevant recognition model.
 
-* For more introduction to object detection, please refer to: [Object Detection Tutorial Document](../image_recognition_pipeline/mainbody_detection.md); for the introduction of feature extraction, please refer to: [Feature Extraction Tutorial Document](../image_recognition_pipeline/feature_extraction.md); for the introduction to vector search, please refer to: [vector search tutorial document](../image_recognition_pipeline/vector_search.md).
+* For more introduction to object detection, please refer to: [Object Detection Tutorial Document](../image_recognition_pipeline/mainbody_detection.md); for the introduction of feature extraction, please refer to: [Feature Extraction Tutorial Document](../image_recognition_pipeline/feature_extraction.md); for the introduction to vector search, please refer to: [vector search tutorial document](../image_recognition_pipeline/vector_search_en.md).

From 363c274ea034909b25f142859103dd099b71e02f Mon Sep 17 00:00:00 2001
From: gaotingquan <gaotingquan@baidu.com>
Date: Sun, 30 Oct 2022 15:07:14 +0000
Subject: [PATCH 2/9] docs: supplement relevant english docs

---
 .../knowledge_distillation_en.md              | 810 ++++++++++++++++++
 docs/en/models/PP-LCNetV2.md                  |  79 ++
 2 files changed, 889 insertions(+)
 create mode 100644 docs/en/advanced_tutorials/knowledge_distillation_en.md
 create mode 100644 docs/en/models/PP-LCNetV2.md

diff --git a/docs/en/advanced_tutorials/knowledge_distillation_en.md b/docs/en/advanced_tutorials/knowledge_distillation_en.md
new file mode 100644
index 0000000000..82ee3a6e7a
--- /dev/null
+++ b/docs/en/advanced_tutorials/knowledge_distillation_en.md
@@ -0,0 +1,810 @@
+
+# Knowledge Distillation Practice
+
+## Contents
+
+- [1. Introduction](#1)
+    - [1.1 Introduction to Knowledge Distillation](#1.1)
+        - [1.1.1 Response based distillation](#1.1.1)
+        - [1.1.2 Feature based distillation](#1.1.2)
+        - [1.1.3 Relation based distillation](#1.1.3)
+    - [1.2 Knowledge Distillation Algorithms Supported by PaddleClas](#1.2)
+        - [1.2.1 SSLD](#1.2.1)
+        - [1.2.2 DML](#1.2.2)
+        - [1.2.3 UDML](#1.2.3)
+        - [1.2.4 AFD](#1.2.4)
+        - [1.2.5 DKD](#1.2.5)
+        - [1.2.6 DIST](#1.2.6)
+        - [1.2.7 MGD](#1.2.7)
+        - [1.2.8 WSL](#1.2.8)
+- [2. Usage](#2)
+    - [2.1 Environment Configuration](#2.1)
+    - [2.2 Data Preparation](#2.2)
+    - [2.3 Model Training](#2.3)
+    - [2.4 Model Evaluation](#2.4)
+    - [2.5 Model Prediction](#2.5)
+    - [2.6 Model Export & Inference](#2.6)
+- [3. References](#3)
+
+<a name="1"></a>
+
+## 1. Introduction
+
+<a name="1.1"></a>
+
+### 1.1 Introduction to Knowledge Distillation
+
+In recent years, deep neural network has been proved to be an effective method to solve problems in computer vision, natural language processing and other fields. By constructing an appropriate neural network and training it, the model performance will basically exceed traditional algorithms.
+
+When data is sufficient, the model performance can be significantly improved by increasing the number of parameters through appropriate construction of the network, but this increases the model complexity. Large models are expensive to deploy in actual scenarios.
+
+Redundancy exists in deep neural networks. At present, there are several methods to compress the model to reduce its parameter amount, e.g. pruning, quantization, knowledge distillation, etc. Knowledge distillation refers to the method that helps the training process of a smaller network (student) under the supervision of a larger network (teacher), so as to ensure the small model can obtain relatively large performance improvement and even obtain accuracy similar to large models without increasing parameters.
+
+Knowledge distillation methods can be divided into three different categories: Response based distillation, Feature based distillation, Relation based distillation. Detailed introduction is as follows.
+
+<a name='1.1.1'></a>
+
+#### 1.1.1 Response based distillation
+
+Knowledge distillation (KD) was first proposed by Hinton, who introduced KL divergence to the training loss function in addition to the cross entropy between the output logits and the ground truth labels. The accuracy of models trained with KD exceeds the accuracy of the same models trained only using ground truth loss. It should be noted that a larger teacher model needs to be trained first to guide the training process of student models.
+
+PaddleClas proposed a simple yet effective SSLD algorithm [6], removing the dependence on ground truth labels. Combined with a large number of unlabeled data, the accuracy of pretrained models obtained from distillation on 18 models was improved by 3+%.
+
+The aforementioned standard distillation method uses large models as teacher models to guide students to improve the performance. Later, Deep Mutual Learning (DML) distillation method [7] was proposed, i.e., two models with the same architecture learn from each other. Compared with KD and other knowledge distillation algorithms that rely on large teacher models, DML is independent of large teacher models. Such training process is simpler and more efficient.
+
+<a name='1.1.2'></a>
+
+#### 1.1.2 Feature based distillation
+
+Heo et al porposed OverHaul of Feature Distillation [8], in which feature map distance between the teacher and the student is calculated as distillation loss. Features of the student are transformed to match the shape of the teacher's features so that the distance can be computed.
+
+Knowledge distillation methods based on feature map distance can be combined with response-based knowledge distillation algorithms mentioned in `3.1`, i.e., the student's outputs and its middle feature maps are supervised simultaneously. For DML, such combination is even simpler, since the student's features can be aligned with the teacher's features without transformation. This method is used in PP-OCRv2, improving the accuracy of OCR models significantly.
+
+<a name='1.1.3'></a>
+
+#### 1.1.3 Relation based distillation
+
+Papers in [1.1.1](#1.1.1) and [1.1.2](#1.1.2) mainly consider the outputs and middle feature maps of student and teacher. These distillation algorithms focus on individual outputs and do not consider relations between individuals.
+
+Park et al proposed RKD [10], a distillation algorithm based on relations. In RKD, mutual relations of data examples is considered, and two loss functions are used, the distance-wise distillation loss and angle-wise distillation loss.
+
+
+The algorithm proposed in this paper, Relational Knowledge Distillation (RKD), transfers the structured relations between the output results obtained from the teacher model to the student model. Unlike the previous algorithms, which only focus on individual output results, the RKD algorithm uses two loss functions: distance-wise distillation loss and angle-wise distillation loss. In the final distillation loss function, both KD loss and RKD loss are considered. The final accuracy is better than that obtained by KD loss distillation only.
+
+<a name='1.2'></a>
+
+### 1.2 Knowledge Distillation Algorithms in PaddleClas
+
+<a name='1.2.1'></a>
+
+#### 1.2.1 SSLD
+
+##### 1.2.1.1 Introduction to SSLD
+
+Paper:
+
+> [Beyond Self-Supervision: A Simple Yet Effective Network Distillation Alternative to Improve Backbones](https://arxiv.org/abs/2103.05959)
+>
+> Cheng Cui, Ruoyu Guo, Yuning Du, Dongliang He, Fu Li, Zewu Wu, Qiwen Liu, Shilei Wen, Jizhou Huang, Xiaoguang Hu, Dianhai Yu, Errui Ding, Yanjun Ma
+>
+> arxiv, 2021
+
+SSLD is a simple semi-supervised distillation method proposed by Baidu in 2021. By designing an improved JS divergence as the loss function and combining the data mining strategy based on ImageNet22k dataset, the accuracy of the 18 backbone network models was improved by more than 3% on average.
+
+For more information about the principle, model zoo and usage of SSLD, please refer to: [Introduction to SSLD](ssld.md).
+
+
+##### 1.2.1.2 Configuration of SSLD
+
+The SSLD configuration is shown below. In the `Arch` field, you need to define both the student model and the teacher model. The teacher model has fixed parameters, and the pretrained parameters are loaded. In the `Loss` field, you need to define `DistillationDMLLoss` as the training loss.
+
+```yaml
+# model architecture
+Arch:
+  name: "DistillationModel"    # model name, here distillation model is used
+  class_num: &class_num 1000   # number of classes, 1000 for ImageNet1k
+  pretrained_list:             # list of pretrained models, leave blank because specified in sub-models
+  freeze_params_list:          # list of freezed params, networks of correspondent index are fixed if set to True
+  - True
+  - False
+  infer_model_name: "Student"  # export Student sub-network when exporting model
+  models:                      # list of sub-networks
+    - Teacher:                 # teacher model
+        name: ResNet50_vd      # model name
+        class_num: *class_num  # number of classes
+        pretrained: True       # pretrained model path, download official pretrained model if set to True
+        use_ssld: True         # whether SSLD pretrained model is used (higher accuracy)
+    - Student:                 # student model
+        name: PPLCNet_x2_5     # model name
+        class_num: *class_num  # number of classes
+        pretrained: False      # pretrained model path, can be bool or string. Set to False here. Student model does not load the pretrained model by default
+
+# loss function config for traing/eval process
+Loss:                           # loss function
+  Train:                        # list of training losses
+    - DistillationDMLLoss:      # distillation DMLLoss. DMLLoss is encapsulated to support loss function of distillation (in dict)
+        weight: 1.0             # weight of loss
+        model_name_pairs:       # model pair used to compute loss. Here loss function between Student and Teacher is computed
+        - ["Student", "Teacher"]
+  Eval:                         # evaluation loss
+    - CELoss:
+        weight: 1.0
+```
+
+<a name='1.2.2'></a>
+
+#### 1.2.2 DML
+
+##### 1.2.2.1 Introduction to DML
+
+Paper:
+
+> [Deep Mutual Learning](https://openaccess.thecvf.com/content_cvpr_2018/html/Zhang_Deep_Mutual_Learning_CVPR_2018_paper.html)
+>
+> Ying Zhang, Tao Xiang, Timothy M. Hospedales, Huchuan Lu
+>
+> CVPR, 2018
+
+In the DML paper, the process of distillation does not depend on a teacher model. Two models with the same architecture learn from each other and calculate the KL divergence of each other's logits.
+
+
+Performance on ImageNet1k is shown below.
+
+| Strategy | Backbone | Config | Top-1 acc | Download Link |
+| --- | --- | --- | --- | --- |
+| baseline | PPLCNet_x2_5 | [PPLCNet_x2_5.yaml](../../../../ppcls/configs/ImageNet/PPLCNet/PPLCNet_x2_5.yaml) | 74.93% | - |
+| DML | PPLCNet_x2_5 | [PPLCNet_x2_5_dml.yaml](../../../../ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_dml.yaml) | 76.68%(**+1.75%**) | - |
+
+
+* Note: Complete PPLCNet_x2_5 The model have been trained for 360 epochs. For comparison, both baseline and DML have been trained for 100 epochs. Therefore, the accuracy is lower than the model (76.60%) opened on the official website.
+
+
+##### 1.2.2.2 Configuration of DML
+
+The DML configuration is shown below. In the `Arch` field, you need to define both the student model and the teacher model. Both models need to be updated. In the `Loss` field, you need to define `DistillationDMLLoss` (JS-Div between student and teacher) and `DistillationGTCELoss` (CE loss with ground truth labels) as the training loss.
+
+```yaml
+Arch:
+  name: "DistillationModel"
+  class_num: &class_num 1000
+  pretrained_list:
+  freeze_params_list:        # mutual learning, params of both models are not freezed
+  - False
+  - False
+  models:
+    - Teacher:
+        name: PPLCNet_x2_5   # mutual learning, so pretrained models are not loaded for both models
+        class_num: *class_num
+        pretrained: False
+    - Student:
+        name: PPLCNet_x2_5
+        class_num: *class_num
+        pretrained: False
+
+Loss:
+  Train:
+    - DistillationGTCELoss:    # CE loss with ground truth labels needs to be computed for both models because pretrained models are not loaded
+        weight: 1.0
+        model_names: ["Student", "Teacher"]
+    - DistillationDMLLoss:
+        weight: 1.0
+        model_name_pairs:
+        - ["Student", "Teacher"]
+  Eval:
+    - CELoss:
+        weight: 1.0
+```
+
+<a name='1.2.3'></a>
+
+#### 1.2.3 UDML
+
+##### 1.2.3.1 Introduction to UDML
+
+
+UDML is a teacher-free knowledge distillation algorithm proposed by PaddleCV group. It is improved based on DML. In addition to the outputs, it also considers the middle layers features in the distillation process, so as to further improve the accuracy of knowledge distillation. For more information about UDML and its application, please refer to: [PP-ShiTu paper](https://arxiv.org/abs/2111.00775) and [PP-OCRv3 paper](https://arxiv.org/abs/2109.03144).
+
+
+
+Performance on ImageNet1k is shown below.
+
+| Strategy | Backbone | Config | Top-1 acc | Download Link |
+| --- | --- | --- | --- | --- |
+| baseline | PPLCNet_x2_5 | [PPLCNet_x2_5.yaml](../../../../ppcls/configs/ImageNet/PPLCNet/PPLCNet_x2_5.yaml) | 74.93% | - |
+| UDML | PPLCNet_x2_5 | [PPLCNet_x2_5_dml.yaml](../../../../ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_udml.yaml) | 76.74%(**+1.81%**) | - |
+
+
+##### 1.2.3.2 Configuration of UDML
+
+
+```yaml
+Arch:
+  name: "DistillationModel"
+  class_num: &class_num 1000
+  # if not null, its lengths should be same as models
+  pretrained_list:
+  # if not null, its lengths should be same as models
+  freeze_params_list:
+  - False
+  - False
+  models:
+    - Teacher:
+        name: PPLCNet_x2_5
+        class_num: *class_num
+        pretrained: False
+        # return_patterns means that in addition to the output logits, the middle feature maps with the corresponding names will also be returned
+        return_patterns: ["blocks3", "blocks4", "blocks5", "blocks6"]
+    - Student:
+        name: PPLCNet_x2_5
+        class_num: *class_num
+        pretrained: False
+        return_patterns: ["blocks3", "blocks4", "blocks5", "blocks6"]
+
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - DistillationGTCELoss:
+       weight: 1.0
+       key: logits
+       model_names: ["Student", "Teacher"]
+    - DistillationDMLLoss:
+        weight: 1.0
+        key: logits
+        model_name_pairs:
+        - ["Student", "Teacher"]
+    - DistillationDistanceLoss:  # distance loss based on features. Here l2 loss is used to calculate the distance between block5s
+        weight: 1.0
+        key: "blocks5"
+        model_name_pairs:
+        - ["Student", "Teacher"]
+  Eval:
+    - CELoss:
+        weight: 1.0
+```
+
+**Note(:** `return_patterns` are specified in the network above. The function of returning middle layer features is based on TheseusLayer. For more information about usage of TheseusLayer, please refer to: [Usage of TheseusLayer](theseus_layer.md).
+
+
+<a name='1.2.4'></a>
+
+#### 1.2.4 AFD
+
+##### 1.2.4.1 Introduction to AFD
+
+Paper:
+
+
+> [Show, attend and distill: Knowledge distillation via attention-based feature matching](https://arxiv.org/abs/2102.02973)
+>
+> Mingi Ji, Byeongho Heo, Sungrae Park
+>
+> AAAI, 2018
+
+AFD proposes to use attention based meta network to learn the relative similarity between features in the distillation process, and apply the identified similarity relation to control the distillation intensity of all possible feature maps pairs.
+
+Performance on ImageNet1k is shown below.
+
+| Strategy | Backbone | Config | Top-1 acc | Download Link |
+| --- | --- | --- | --- | --- |
+| baseline | ResNet18 | [ResNet18.yaml](../../../../ppcls/configs/ImageNet/ResNet/ResNet18.yaml) | 70.8% | - |
+| AFD | ResNet18 | [resnet34_distill_resnet18_afd.yaml](../../../../ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_afd.yaml) | 71.68%(**+0.88%**) | - |
+
+Note: In order to keep alignment with the training configuration in the paper, the number of training iterations is set to be 100 epochs, so the baseline accuracy is lower than the open source model accuracy in PaddleClas (71.0%).
+
+##### 1.2.4.2 Configuration of AFD
+
+The AFD configuration is shown below. In the `Arch` field, you need to define both the student model and the teacher model. The teacher model has fixed parameters. In the `Loss` field, you need to define `DistillationKLDivLoss` (KL-Div between student and teacher), `AFDLoss` (AFD loss between student and teacher) and `DistillationGTCELoss` (CE loss with ground truth labels) as the training loss.
+
+```yaml
+Arch:
+  name: "DistillationModel"
+  pretrained_list:
+  freeze_params_list:
+  models:
+    - Teacher:
+        name: AttentionModel # contains several serial networks. The following networks take outputs of previous networks as input
+        pretrained_list:
+        freeze_params_list:
+          - True
+          - False
+        models:
+          # basic network of AttentionModel
+          - ResNet34:
+              name: ResNet34
+              pretrained: True
+              # return_patterns means that in addition to the output logits, the middle feature maps with the corresponding names will also be returned
+              return_patterns: &t_keys ["blocks[0]", "blocks[1]", "blocks[2]", "blocks[3]",
+                                        "blocks[4]", "blocks[5]", "blocks[6]", "blocks[7]",
+                                        "blocks[8]", "blocks[9]", "blocks[10]", "blocks[11]",
+                                        "blocks[12]", "blocks[13]", "blocks[14]", "blocks[15]"]
+          # transformation network of AttentionModel. It transforms the sub-networks in the basic network
+          - LinearTransformTeacher:
+              name: LinearTransformTeacher
+              qk_dim: 128
+              keys: *t_keys
+              t_shapes: &t_shapes [[64, 56, 56], [64, 56, 56], [64, 56, 56], [128, 28, 28],
+                                   [128, 28, 28], [128, 28, 28], [128, 28, 28], [256, 14, 14],
+                                   [256, 14, 14], [256, 14, 14], [256, 14, 14], [256, 14, 14],
+                                   [256, 14, 14], [512, 7, 7], [512, 7, 7], [512, 7, 7]]
+
+    - Student:
+        name: AttentionModel
+        pretrained_list:
+        freeze_params_list:
+          - False
+          - False
+        models:
+          - ResNet18:
+              name: ResNet18
+              pretrained: False
+              return_patterns: &s_keys ["blocks[0]", "blocks[1]", "blocks[2]", "blocks[3]",
+                                        "blocks[4]", "blocks[5]", "blocks[6]", "blocks[7]"]
+          - LinearTransformStudent:
+              name: LinearTransformStudent
+              qk_dim: 128
+              keys: *s_keys
+              s_shapes: &s_shapes [[64, 56, 56], [64, 56, 56], [128, 28, 28], [128, 28, 28],
+                                   [256, 14, 14], [256, 14, 14], [512, 7, 7], [512, 7, 7]]
+              t_shapes: *t_shapes
+
+  infer_model_name: "Student"
+
+
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - DistillationGTCELoss:
+        weight: 1.0
+        model_names: ["Student"]
+        key: logits
+    - DistillationKLDivLoss:  # distillation KL-Div loss, features are extracted according to names in model_name_pairs to calculate loss
+        weight: 0.9           # weight of loss
+        model_name_pairs: [["Student", "Teacher"]]
+        temperature: 4
+        key: logits
+    - AFDLoss:                # AFD loss
+        weight: 50.0
+        model_name_pair: ["Student", "Teacher"]
+        student_keys: ["bilinear_key", "value"]
+        teacher_keys: ["query", "value"]
+        s_shapes: *s_shapes
+        t_shapes: *t_shapes
+  Eval:
+    - CELoss:
+        weight: 1.0
+```
+
+**Note(:** `return_patterns` are specified in the network above. The function of returning middle layer features is based on TheseusLayer. For more information about usage of TheseusLayer, please refer to: [Usage of TheseusLayer](theseus_layer.md).
+
+<a name='1.2.5'></a>
+
+#### 1.2.5 DKD
+
+##### 1.2.5.1 Introduction to DKD
+
+Paper:
+
+
+> [Decoupled Knowledge Distillation](https://arxiv.org/abs/2203.08679)
+>
+> Borui Zhao, Quan Cui, Renjie Song, Yiyu Qiu, Jiajun Liang
+>
+> CVPR, 2022
+
+DKD reformulates the classical KD loss into two parts, i.e., target class knowledge distillation (TCKD) and non-target class knowledge distillation (NCKD). The effect of the two parts is studied separately, and their weights can be adjusted independently, improving the accuracy and flexibility of distillation.
+
+Performance on ImageNet1k is shown below.
+
+| Strategy | Backbone | Config | Top-1 acc | Download Link |
+| --- | --- | --- | --- | --- |
+| baseline | ResNet18 | [ResNet18.yaml](../../../../ppcls/configs/ImageNet/ResNet/ResNet18.yaml) | 70.8% | - |
+| DKD | ResNet18 | [resnet34_distill_resnet18_dkd.yaml](../../../../ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_dkd.yaml) | 72.59%(**+1.79%**) | - |
+
+
+##### 1.2.5.2 Configuration of DKD
+
+The DKD configuration is shown below. In the `Arch` field, you need to define both the student model and the teacher model. The teacher model has fixed parameters, and the pretrained parameters are loaded. In the `Loss` field, you need to define `DistillationDKDLoss` (DKD loss between student and teacher) and `DistillationGTCELoss` (CE loss with ground truth labels) as the training loss.
+
+
+```yaml
+Arch:
+  name: "DistillationModel"
+  # if not null, its lengths should be same as models
+  pretrained_list:
+  # if not null, its lengths should be same as models
+  freeze_params_list:
+  - True
+  - False
+  models:
+    - Teacher:
+        name: ResNet34
+        pretrained: True
+
+    - Student:
+        name: ResNet18
+        pretrained: False
+
+  infer_model_name: "Student"
+
+
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - DistillationGTCELoss:
+        weight: 1.0
+        model_names: ["Student"]
+    - DistillationDKDLoss:
+        weight: 1.0
+        model_name_pairs: [["Student", "Teacher"]]
+        temperature: 1
+        alpha: 1.0
+        beta: 1.0
+  Eval:
+    - CELoss:
+        weight: 1.0
+```
+
+<a name='1.2.6'></a>
+
+#### 1.2.6 DIST
+
+##### 1.2.6.1 Introduction to DIST
+
+Paper:
+
+
+> [Knowledge Distillation from A Stronger Teacher](https://arxiv.org/pdf/2205.10536v1.pdf)
+>
+> Tao Huang, Shan You, Fei Wang, Chen Qian, Chang Xu
+>
+> 2022, under review
+
+When using the KD method for distillation, as the accuracy of the teacher model is improved, the effect of distillation is often difficult to improve simultaneously. This paper proposes the DIST method, which uses the Pearson correlation coefficient to represent the difference between the student model and the teacher model, instead of the default KL-divergence in the distillation process, so as to ensure that the model can learn more accurate correlation information.
+
+Performance on ImageNet1k is shown below.
+
+| Strategy | Backbone | Config | Top-1 acc | Download Link |
+| --- | --- | --- | --- | --- |
+| baseline | ResNet18 | [ResNet18.yaml](../../../../ppcls/configs/ImageNet/ResNet/ResNet18.yaml) | 70.8% | - |
+| DIST | ResNet18 | [resnet34_distill_resnet18_dist.yaml](../../../../ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_dist.yaml) | 71.99%(**+1.19%**) | - |
+
+
+##### 1.2.6.2 Configuration of DIST
+
+The DIST configuration is shown below. In the `Arch` field, you need to define both the student model and the teacher model. The teacher model has fixed parameters, and the pretrained parameters are loaded. In the `Loss` field, you need to define `DistillationDISTLoss` (DIST loss between student and teacher) and `DistillationGTCELoss` (CE loss with ground truth labels) as the training loss.
+
+
+```yaml
+Arch:
+  name: "DistillationModel"
+  # if not null, its lengths should be same as models
+  pretrained_list:
+  # if not null, its lengths should be same as models
+  freeze_params_list:
+  - True
+  - False
+  models:
+    - Teacher:
+        name: ResNet34
+        pretrained: True
+
+    - Student:
+        name: ResNet18
+        pretrained: False
+
+  infer_model_name: "Student"
+
+
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - DistillationGTCELoss:
+        weight: 1.0
+        model_names: ["Student"]
+    - DistillationDISTLoss:
+        weight: 2.0
+        model_name_pairs:
+        - ["Student", "Teacher"]
+  Eval:
+    - CELoss:
+        weight: 1.0
+```
+
+<a name='1.2.7'></a>
+
+#### 1.2.7 MGD
+
+##### 1.2.7.1 Introduction to MGD
+
+Paper:
+
+
+> [Masked Generative Distillation](https://arxiv.org/abs/2205.01529)
+>
+> Zhendong Yang, Zhe Li, Mingqi Shao, Dachuan Shi, Zehuan Yuan, Chun Yuan
+>
+> ECCV 2022
+
+This method performs distillation on the feature map. In the process of distillation, random masks are applied to the features, and the students are forced to use some features to generate all the features of the teacher model, so as to improve the representation ability of the student model. MGD achieve state-of-the-art performance on the feature distillation task, and has been widely verified to be effective in tasks such as detection and segmentation.
+
+Performance on ImageNet1k is shown below.
+
+| Strategy | Backbone | Config | Top-1 acc | Download Link |
+| --- | --- | --- | --- | --- |
+| baseline | ResNet18 | [ResNet18.yaml](../../../../ppcls/configs/ImageNet/ResNet/ResNet18.yaml) | 70.8% | - |
+| MGD | ResNet18 | [resnet34_distill_resnet18_mgd.yaml](../../../../ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_mgd.yaml) | 71.86%(**+1.06%**) | - |
+
+
+##### 1.2.7.2 Configuration of MGD
+
+The MGD configuration is shown below. In the `Arch` field, you need to define both the student model and the teacher model. The teacher model has fixed parameters, and the pretrained parameters are loaded. In the `Loss` field, you need to define `DistillationPairLoss` (MGD loss between student and teacher) and `DistillationGTCELoss` (CE loss with ground truth labels) as the training loss.
+
+```yaml
+Arch:
+  name: "DistillationModel"
+  class_num: &class_num 1000
+  # if not null, its lengths should be same as models
+  pretrained_list:
+  # if not null, its lengths should be same as models
+  freeze_params_list:
+  - True
+  - False
+  infer_model_name: "Student"
+  models:
+    - Teacher:
+        name: ResNet34
+        class_num: *class_num
+        pretrained: True
+        return_patterns: &t_stages ["blocks[2]", "blocks[6]", "blocks[12]", "blocks[15]"]
+    - Student:
+        name: ResNet18
+        class_num: *class_num
+        pretrained: False
+        return_patterns: &s_stages ["blocks[1]", "blocks[3]", "blocks[5]", "blocks[7]"]
+
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - DistillationGTCELoss:
+        weight: 1.0
+        model_names: ["Student"]
+    - DistillationPairLoss:
+        weight: 1.0
+        model_name_pairs: [["Student", "Teacher"]] # calculate mgdloss for Student and Teacher
+        name: "loss_mgd"
+        base_loss_name: MGDLoss # MGD loss, the following are parameters of 'MGD loss'
+        s_keys: ["blocks[7]"]   # feature map used to calculate MGD loss in student model
+        t_keys: ["blocks[15]"]  # feature map used to calculate MGD loss in teacher model
+        student_channels: 512   # channel num for stduent feature map
+        teacher_channels: 512   # channel num for teacher feature map
+  Eval:
+    - CELoss:
+        weight: 1.0
+```
+
+<a name='1.2.8'></a>
+
+#### 1.2.8 WSL
+
+##### 1.2.8.1 Introduction to WSL
+
+Paper:
+
+
+> [Rethinking Soft Labels For Knowledge Distillation: A Bias-variance Tradeoff Perspective](https://arxiv.org/abs/2102.0650)
+>
+> Helong Zhou, Liangchen Song, Jiajie Chen, Ye Zhou, Guoli Wang, Junsong Yuan, Qian Zhang
+>
+> ICLR, 2021
+
+Weighted Soft Labels (WSL) loss function assigns weights to the KD Loss of each sample according to the CE Loss ratio of the teacher model and the student model with respect to the ground-truth labels. If the student model predicts a certain sample better than the teacher model, a smaller weight will be assigned to the sample. The method is simple and effective. It enables the weight of each sample to be adaptively adjusted, thereby improving the distillation accuracy.
+
+Performance on ImageNet1k is shown below.
+
+| Strategy | Backbone | Config | Top-1 acc | Download Link |
+| --- | --- | --- | --- | --- |
+| baseline | ResNet18 | [ResNet18.yaml](../../../../ppcls/configs/ImageNet/ResNet/ResNet18.yaml) | 70.8% | - |
+| WSL | ResNet18 | [resnet34_distill_resnet18_wsl.yaml](../../../../ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_wsl.yaml) | 72.23%(**+1.43%**) | - |
+
+
+##### 1.2.8.2 Configuration of WSL
+
+The WSL configuration is shown below. In the `Arch` field, you need to define both the student model and the teacher model. The teacher model has fixed parameters, and the pretrained parameters are loaded. In the `Loss` field, you need to define `DistillationWSLLoss` (WSL loss between student and teacher) and `DistillationGTCELoss` (CE loss with ground truth labels) as the training loss.
+
+
+```yaml
+# model architecture
+Arch:
+  name: "DistillationModel"
+  # if not null, its lengths should be same as models
+  pretrained_list:
+  # if not null, its lengths should be same as models
+  freeze_params_list:
+  - True
+  - False
+  models:
+    - Teacher:
+        name: ResNet34
+        pretrained: True
+
+    - Student:
+        name: ResNet18
+        pretrained: False
+
+  infer_model_name: "Student"
+
+
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - DistillationGTCELoss:
+        weight: 1.0
+        model_names: ["Student"]
+    - DistillationWSLLoss:
+        weight: 2.5
+        model_name_pairs: [["Student", "Teacher"]]
+        temperature: 2
+  Eval:
+    - CELoss:
+        weight: 1.0
+```
+
+<a name="2"></a>
+
+## 2. Training, Evaluation and Prediction
+
+<a name="2.1"></a>  
+
+### 2.1 Environment Configuration
+
+* Installation: Please refer to [Paddle Installation Tutorial](../installation/install_paddle.md) and [PaddleClas Installation Tutorial](../../installation.md) to configure the running environment.
+
+<a name="2.2"></a>
+
+### 2.2 Data Preparation
+
+Please prepare the ImageNet-1k dataset on [ImageNet website](https://www.image-net.org/).
+
+
+Enter PaddleClas directory.
+
+```
+cd path_to_PaddleClas
+```
+
+Enter `dataset/` directory, name the downloaded data `ILSVRC2012` and store it here. The `ILSVRC2012` directory contains the following data:
+
+```
+├── train
+│   ├── n01440764
+│   │   ├── n01440764_10026.JPEG
+│   │   ├── n01440764_10027.JPEG
+├── train_list.txt
+...
+├── val
+│   ├── ILSVRC2012_val_00000001.JPEG
+│   ├── ILSVRC2012_val_00000002.JPEG
+├── val_list.txt
+```
+
+where `train/` and `val/` are training set and validation set, respectively. `train_list.txt` and `val_list.txt` are label files for training set and validation set, respectively.
+
+
+If unlabeled data similar to the training set scenario is included, they can also be organized in the same way as the training set labels. Place the file in the same directory as the currently labeled dataset, and mark its label value as 0. Suppose the organized tag file is named `train_list_unlabel.txt`, you can use the following command to generate a label file for SSLD training.
+
+```shell
+cat train_list.txt train_list_unlabel.txt > train_list_all.txt
+```
+
+
+**Note:**
+
+* For more information about the format of `train_list.txt` and `val_list.txt`, you may refer to [Format Description of PaddleClas Classification Dataset](../single_label_classification/dataset.md#1-数据集格式说明) .
+
+
+<a name="2.3"></a>
+
+### 2.3 Model Training
+
+
+In this section, the process of model training, evaluation and prediction of knowledge distillation algorithm will be introduced using the SSLD knowledge distillation algorithm as an example. The configuration file is [PPLCNet_x2_5_ssld.yaml](../../../../ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_ssld.yaml). You can use the following command to complete the model training.
+
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_ssld.yaml
+```
+
+<a name="2.4"></a>
+
+### 2.4 Model Evaluation
+
+After training the model, the following command can be used to evaluate the performance of the model:
+
+```bash
+python3 tools/eval.py \
+    -c ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_ssld.yaml \
+    -o Global.pretrained_model=output/DistillationModel/best_model
+```
+
+where `-o Global.pretrained_model="output/DistillationModel/best_model"` specifies the path of the current optimal weights. If you need to specify other weights, you can simply replace the path.
+
+<a name="2.5"></a>
+
+### 2.5 Model Prediction
+
+After training is completed, the trained model can be loaded for prediction. A complete example is provided in `tools/infer.py`. You can use the model for prediction by executing the following command:
+
+```python
+python3 tools/infer.py \
+    -c ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_ssld.yaml \
+    -o Global.pretrained_model=output/DistillationModel/best_model
+```
+
+The outputs are as follows:
+
+```
+[{'class_ids': [8, 7, 86, 82, 21], 'scores': [0.87908, 0.12091, 0.0, 0.0, 0.0], 'file_name': 'docs/images/inference_deployment/whl_demo.jpg', 'label_names': ['hen', 'cock', 'partridge', 'ruffed grouse, partridge, Bonasa umbellus', 'kite']}]
+```
+
+
+**Note:**
+
+* Here `-o Global.pretrained_model="output/ResNet50/best_model"` specifies the path of the current optimal weights. If you need to specify other weights, you can simply replace the path.
+
+* Image `docs/images/inference_deployment/whl_demo.jpg` is predicted by default. You can also predict other images by adding a field `-o Infer.infer_imgs=xxx`.
+
+
+<a name="2.6"></a>
+
+### 2.6 Model Export & Inference
+
+
+PaddleInference is a native inference library for PaddlePaddle, which can be used on servers and clouds to provide high-performance inference. PaddleInference can use MKLDNN, CUDNN, and TensorRT to accelerate model inference, thereby achieving better performance compared with inference based directly on the trained model. For more information about PaddleInference, please refer to [Paddle Inference Tutorial](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html).
+
+The model needs to be exported before inference. For models obtained from knowledge distillation, `-o Global.infer_model_name=Student` should be specified when exporting to indicate the model to be exported is the student model. The complete command is shown below.
+
+```shell
+python3 tools/export_model.py \
+    -c ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_ssld.yaml \
+    -o Global.pretrained_model=./output/DistillationModel/best_model \
+    -o Arch.infer_model_name=Student
+```
+
+3 files will be generated in `inference` directory: `inference.pdiparams`, `inference.pdiparams.info` and `inference.pdmodel`.
+
+For more information about model inference, please refer to: [Python Inference](../../deployment/image_classification/python.md).
+
+
+<a name="3"></a>
+
+## 3. References
+
+[1] Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[J]. arXiv preprint arXiv:1503.02531, 2015.
+
+[2] Bagherinezhad H, Horton M, Rastegari M, et al. Label refinery: Improving imagenet classification through label progression[J]. arXiv preprint arXiv:1805.02641, 2018.
+
+[3] Yalniz I Z, Jégou H, Chen K, et al. Billion-scale semi-supervised learning for image classification[J]. arXiv preprint arXiv:1905.00546, 2019.
+
+[4] Cubuk E D, Zoph B, Mane D, et al. Autoaugment: Learning augmentation strategies from data[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2019: 113-123.
+
+[5] Touvron H, Vedaldi A, Douze M, et al. Fixing the train-test resolution discrepancy[C]//Advances in Neural Information Processing Systems. 2019: 8250-8260.
+
+[6] Cui C, Guo R, Du Y, et al. Beyond Self-Supervision: A Simple Yet Effective Network Distillation Alternative to Improve Backbones[J]. arXiv preprint arXiv:2103.05959, 2021.
+
+[7] Zhang Y, Xiang T, Hospedales T M, et al. Deep mutual learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 4320-4328.
+
+[8] Heo B, Kim J, Yun S, et al. A comprehensive overhaul of feature distillation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 1921-1930.
+
+[9] Du Y, Li C, Guo R, et al. PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System[J]. arXiv preprint arXiv:2109.03144, 2021.
+
+[10] Park W, Kim D, Lu Y, et al. Relational knowledge distillation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 3967-3976.
+
+[11] Zhao B, Cui Q, Song R, et al. Decoupled Knowledge Distillation[J]. arXiv preprint arXiv:2203.08679, 2022.
+
+[12] Ji M, Heo B, Park S. Show, attend and distill: Knowledge distillation via attention-based feature matching[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2021, 35(9): 7945-7952.
+
+[13] Huang T, You S, Wang F, et al. Knowledge Distillation from A Stronger Teacher[J]. arXiv preprint arXiv:2205.10536, 2022.
diff --git a/docs/en/models/PP-LCNetV2.md b/docs/en/models/PP-LCNetV2.md
new file mode 100644
index 0000000000..df5bd535f6
--- /dev/null
+++ b/docs/en/models/PP-LCNetV2.md
@@ -0,0 +1,79 @@
+# PP-LCNetV2 Series
+---
+
+## Catalogue
+
+- [1. Introduction ](#2)
+- [2. Method](#3)
+    - [2.1 Rep 策略](#2.1)
+    - [2.2 PW 卷积](#2.2)
+    - [2.3 Shortcut](#2.3)
+    - [2.4 激活函数](#2.4)
+    - [2.5 SE 模块](#2.5)
+- [3. Experiments](#3)
+
+<a name="1"></a>
+
+### 1. Instroduction
+
+At present, although the lightweight models are so plentiful, there are few models specially optimized for Intel CPU platform. We have proposed [PPLCNetV1](PP-LCNet.md) , which pursues faster inference speed, so its performance is limited and the performance is insufficient when dealing with complex tasks. Therefore, we now propose PPLCNetV2 to fill the gap in the lack of a higher performance model on CPU platform. For the consideration of compatibility, OpenVINO is currently a widely used deployment framework in Intel CPU platform. Therefore, we focus on optimizing the model for the deployment scheme of Intel CPU with OpenVINO.
+
+<a name="2"></a>
+
+## 2. Methods
+
+![](../../../images/PP-LCNetV2/net.png)
+
+The overview of PPLCNetV2 is shown in the figure above. PPLCNetV2 is obtained on PPLCNet. The details of optimization tracks is shown in this section.
+
+<a name="2.1"></a>
+
+### 2.1 Re-parameterization
+
+There are lots of depthwise convolution in PPLCNetV2, so we optimize some of the depthwise convolution by the re-parameterization. The size of the convolution kernel affects the size of the model's receptive field, which affects the model's ability to capture more global or local features. In order to help the model build different scales features, we use 5\*5, 3\*3 and 1\*1 size convolution kernel. The details is shown in the figure below.
+
+![](../../../images/PP-LCNetV2/rep.png)
+
+<a name="2.2"></a>
+
+### 2.2 PW Conv
+
+We know that the network is more deeper, the model is more stronger. So we replaced some point convolution layer with two layers that squeeze and expand the channel dimensions of the feature, respectively. The details is shown in the figure below. Finally, we use this optimization method in the second last-to-last block.
+
+![](../../../images/PP-LCNetV2/split_pw.png)
+
+<a name="2.3"></a>
+
+### 2.3 Shortcut
+
+It is believed that the Shortcut can alleviate the vanishing gradient problem, so it is more important for the improvement of deep networks. However, Shortcut is generally used with caution in the lightweight models because it results in an elementwise addition operation and possibly memory access. We experimented on the influence of shortcut on the model at different stage. Finally, we only used Shortcut in the last block, as shown in the figure below.
+
+![](../../../images/PP-LCNetV2/shortcut.png)
+
+<a name="2.4"></a>
+
+### 2.4 Activation Function
+
+ In recent years, many activation functions have been proposed, such as ReLU, Swish, Hard-Swish, Mish, but they usually pay more attention to the improvement of model performance without considering the impact on model efficiency. In particular, some activation functions contain complex operations, or are difficult to the optimization of the inference platform. So some act functions can have a serious efficiency impact in model. We experimented with different activation functions to evaluate their efficiency. We found that although the ReLU activation function is slightly lower than H-Swish in model performance, the gap is very small in larger models of this magnitude, but ReLU has a huge advantage in speed because which is so brief that very easy to optimize. There is no doubt that we chose the ReLU activation function after considering it all.
+
+<a name="2.5"></a>
+
+### 2.5 SE
+
+SE Module has been concerned since it was proposed and has almost become a necessary option for model designs. It can significantly improve model performance by helping the model to improve the channel attention ability. However, in the lightweight models, SE Module will not only improve the performance of the model, but also increase the inference latency. In PPLCNetV2, we used SE more sparingly, only on the penultimate blocks.
+
+<a name="3"></a>
+
+## 3. Experiments
+
+The accuracy on ImageNet1k, latency of inference and download links of pretrained models are provided as follows.
+
+| Model | Params(M) | FLOPs(M) | Top-1 Acc(\%) | Top-5 Acc(\%) | Latency(ms) | download links of pretrained model | download links of inference model |
+|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
+| <b>PPLCNetV2_base<b>  | <b>6.6<b> | <b>604<b>  | <b>77.04<b> | <b>93.27<b> | <b>4.32<b> | [link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNetV2_base_pretrained.pdparams) | [link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNetV2_base_infer.tar) |
+| <b>PPLCNetV2_base_ssld<b>  | <b>6.6<b> | <b>604<b>  | <b>80.07<b> | <b>94.87<b> | <b>4.32<b> | [link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNetV2_base_ssld_pretrained.pdparams) | [link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNetV2_base_ssld_infer.tar) |
+
+**Note:**
+
+* 1. where `_ssld` represents the model after using `SSLD distillation`. For details about `SSLD distillation`, see [SSLD distillation](../advanced_tutorials/distillation/distillation_en.md);
+* 2. The latency is tested with officially OpenVINO benchmark tool on Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz.

From adafe90ba6a7d50228a27f182b06a576a9b24153 Mon Sep 17 00:00:00 2001
From: gaotingquan <gaotingquan@baidu.com>
Date: Mon, 31 Oct 2022 03:15:09 +0000
Subject: [PATCH 3/9] docs: update demo

---
 docs/en/inference_deployment/whl_deploy_en.md     | 4 ++--
 docs/zh_CN/deployment/image_classification/whl.md | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/en/inference_deployment/whl_deploy_en.md b/docs/en/inference_deployment/whl_deploy_en.md
index 7c94f6ded4..d726005a22 100644
--- a/docs/en/inference_deployment/whl_deploy_en.md
+++ b/docs/en/inference_deployment/whl_deploy_en.md
@@ -62,7 +62,7 @@ print(next(result))
 
 ```
 >>> result
-[{'class_ids': [8, 7, 136, 80, 84], 'scores': [0.79368, 0.16329, 0.01853, 0.00959, 0.00239], 'label_names': ['hen', 'cock', 'European gallinule, Porphyrio porphyrio', 'black grouse', 'peacock']}]
+[{'class_ids': [8, 7, 86, 82, 80], 'scores': [0.97968, 0.02028, 3e-05, 1e-05, 0.0], 'label_names': ['hen', 'cock', 'partridge', 'ruffed grouse, partridge, Bonasa umbellus', 'black grouse'], 'filename': 'docs/images/inference_deployment/whl_demo.jpg'}]
 ```
 
 * CLI
@@ -72,7 +72,7 @@ paddleclas --model_name=ResNet50  --infer_imgs="docs/images/inference_deployment
 
 ```
 >>> result
-filename: docs/images/inference_deployment/whl_demo.jpg, top-5, class_ids: [8, 7, 136, 80, 84], scores: [0.79368, 0.16329, 0.01853, 0.00959, 0.00239], label_names: ['hen', 'cock', 'European gallinule, Porphyrio porphyrio', 'black grouse', 'peacock']
+class_ids: [8, 7, 86, 82, 80], scores: [0.97968, 0.02028, 3e-05, 1e-05, 0.0], label_names: ['hen', 'cock', 'partridge', 'ruffed grouse, partridge, Bonasa umbellus', 'black grouse'], filename: docs/images/inference_deployment/whl_demo.jpg
 Predict complete!
 ```
 
diff --git a/docs/zh_CN/deployment/image_classification/whl.md b/docs/zh_CN/deployment/image_classification/whl.md
index 38422bc0c0..89d47d0082 100644
--- a/docs/zh_CN/deployment/image_classification/whl.md
+++ b/docs/zh_CN/deployment/image_classification/whl.md
@@ -55,7 +55,7 @@ print(next(result))
 
 ```
 >>> result
-[{'class_ids': [8, 7, 136, 80, 84], 'scores': [0.79368, 0.16329, 0.01853, 0.00959, 0.00239], 'label_names': ['hen', 'cock', 'European gallinule, Porphyrio porphyrio', 'black grouse', 'peacock']}]
+[{'class_ids': [8, 7, 86, 82, 80], 'scores': [0.97968, 0.02028, 3e-05, 1e-05, 0.0], 'label_names': ['hen', 'cock', 'partridge', 'ruffed grouse, partridge, Bonasa umbellus', 'black grouse'], 'filename': 'docs/images/inference_deployment/whl_demo.jpg'}]
 ```
 
 * 在命令行中使用
@@ -65,7 +65,7 @@ paddleclas --model_name=ResNet50  --infer_imgs="docs/images/inference_deployment
 
 ```
 >>> result
-filename: docs/images/inference_deployment/whl_demo.jpg, top-5, class_ids: [8, 7, 136, 80, 84], scores: [0.79368, 0.16329, 0.01853, 0.00959, 0.00239], label_names: ['hen', 'cock', 'European gallinule, Porphyrio porphyrio', 'black grouse', 'peacock']
+class_ids: [8, 7, 86, 82, 80], scores: [0.97968, 0.02028, 3e-05, 1e-05, 0.0], label_names: ['hen', 'cock', 'partridge', 'ruffed grouse, partridge, Bonasa umbellus', 'black grouse'], filename: docs/images/inference_deployment/whl_demo.jpg
 Predict complete!
 ```
 

From 8d2c396842e596899abe92fa755eb229e18bb07a Mon Sep 17 00:00:00 2001
From: gaotingquan <gaotingquan@baidu.com>
Date: Mon, 31 Oct 2022 09:45:51 +0000
Subject: [PATCH 4/9] docs: fix links

---
 .../{PP-LCNetV2.md => PP-LCNetV2_en.md}       | 20 +++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)
 rename docs/en/models/{PP-LCNetV2.md => PP-LCNetV2_en.md} (84%)

diff --git a/docs/en/models/PP-LCNetV2.md b/docs/en/models/PP-LCNetV2_en.md
similarity index 84%
rename from docs/en/models/PP-LCNetV2.md
rename to docs/en/models/PP-LCNetV2_en.md
index df5bd535f6..13abf6caa4 100644
--- a/docs/en/models/PP-LCNetV2.md
+++ b/docs/en/models/PP-LCNetV2_en.md
@@ -4,25 +4,25 @@
 ## Catalogue
 
 - [1. Introduction ](#2)
-- [2. Method](#3)
-    - [2.1 Rep 策略](#2.1)
-    - [2.2 PW 卷积](#2.2)
+- [2. Methods](#3)
+    - [2.1 Re-parameterization](#2.1)
+    - [2.2 PW Conv](#2.2)
     - [2.3 Shortcut](#2.3)
-    - [2.4 激活函数](#2.4)
-    - [2.5 SE 模块](#2.5)
+    - [2.4 Activation Function](#2.4)
+    - [2.5 SE](#2.5)
 - [3. Experiments](#3)
 
 <a name="1"></a>
 
 ### 1. Instroduction
 
-At present, although the lightweight models are so plentiful, there are few models specially optimized for Intel CPU platform. We have proposed [PPLCNetV1](PP-LCNet.md) , which pursues faster inference speed, so its performance is limited and the performance is insufficient when dealing with complex tasks. Therefore, we now propose PPLCNetV2 to fill the gap in the lack of a higher performance model on CPU platform. For the consideration of compatibility, OpenVINO is currently a widely used deployment framework in Intel CPU platform. Therefore, we focus on optimizing the model for the deployment scheme of Intel CPU with OpenVINO.
+At present, although the lightweight models are so plentiful, there are few models specially optimized for Intel CPU platform. We have proposed [PPLCNetV1](PP-LCNet_en.md) , which pursues faster inference speed, so its performance is limited and the performance is insufficient when dealing with complex tasks. Therefore, we now propose PPLCNetV2 to fill the gap in the lack of a higher performance model on CPU platform. For the consideration of compatibility, OpenVINO is currently a widely used deployment framework in Intel CPU platform. Therefore, we focus on optimizing the model for the deployment scheme of Intel CPU with OpenVINO.
 
 <a name="2"></a>
 
 ## 2. Methods
 
-![](../../../images/PP-LCNetV2/net.png)
+![](../../images/PP-LCNetV2/net.png)
 
 The overview of PPLCNetV2 is shown in the figure above. PPLCNetV2 is obtained on PPLCNet. The details of optimization tracks is shown in this section.
 
@@ -32,7 +32,7 @@ The overview of PPLCNetV2 is shown in the figure above. PPLCNetV2 is obtained on
 
 There are lots of depthwise convolution in PPLCNetV2, so we optimize some of the depthwise convolution by the re-parameterization. The size of the convolution kernel affects the size of the model's receptive field, which affects the model's ability to capture more global or local features. In order to help the model build different scales features, we use 5\*5, 3\*3 and 1\*1 size convolution kernel. The details is shown in the figure below.
 
-![](../../../images/PP-LCNetV2/rep.png)
+![](../../images/PP-LCNetV2/rep.png)
 
 <a name="2.2"></a>
 
@@ -40,7 +40,7 @@ There are lots of depthwise convolution in PPLCNetV2, so we optimize some of the
 
 We know that the network is more deeper, the model is more stronger. So we replaced some point convolution layer with two layers that squeeze and expand the channel dimensions of the feature, respectively. The details is shown in the figure below. Finally, we use this optimization method in the second last-to-last block.
 
-![](../../../images/PP-LCNetV2/split_pw.png)
+![](../../images/PP-LCNetV2/split_pw.png)
 
 <a name="2.3"></a>
 
@@ -48,7 +48,7 @@ We know that the network is more deeper, the model is more stronger. So we repla
 
 It is believed that the Shortcut can alleviate the vanishing gradient problem, so it is more important for the improvement of deep networks. However, Shortcut is generally used with caution in the lightweight models because it results in an elementwise addition operation and possibly memory access. We experimented on the influence of shortcut on the model at different stage. Finally, we only used Shortcut in the last block, as shown in the figure below.
 
-![](../../../images/PP-LCNetV2/shortcut.png)
+![](../../images/PP-LCNetV2/shortcut.png)
 
 <a name="2.4"></a>
 

From f5cfa7fc8c097717b17125ce515cce941407d9e2 Mon Sep 17 00:00:00 2001
From: gaotingquan <gaotingquan@baidu.com>
Date: Mon, 31 Oct 2022 11:32:16 +0000
Subject: [PATCH 5/9] docs: fix invalid links

---
 .../knowledge_distillation_en.md              | 48 +++++++++++--------
 .../feature_extraction_en.md                  |  2 +-
 .../quick_start/quick_start_recognition_en.md |  5 +-
 3 files changed, 32 insertions(+), 23 deletions(-)

diff --git a/docs/en/advanced_tutorials/knowledge_distillation_en.md b/docs/en/advanced_tutorials/knowledge_distillation_en.md
index 82ee3a6e7a..18e8aa38cc 100644
--- a/docs/en/advanced_tutorials/knowledge_distillation_en.md
+++ b/docs/en/advanced_tutorials/knowledge_distillation_en.md
@@ -91,7 +91,7 @@ Paper:
 
 SSLD is a simple semi-supervised distillation method proposed by Baidu in 2021. By designing an improved JS divergence as the loss function and combining the data mining strategy based on ImageNet22k dataset, the accuracy of the 18 backbone network models was improved by more than 3% on average.
 
-For more information about the principle, model zoo and usage of SSLD, please refer to: [Introduction to SSLD](ssld.md).
+<!-- For more information about the principle, model zoo and usage of SSLD, please refer to: [Introduction to SSLD](ssld_en.md). -->
 
 
 ##### 1.2.1.2 Configuration of SSLD
@@ -152,8 +152,8 @@ Performance on ImageNet1k is shown below.
 
 | Strategy | Backbone | Config | Top-1 acc | Download Link |
 | --- | --- | --- | --- | --- |
-| baseline | PPLCNet_x2_5 | [PPLCNet_x2_5.yaml](../../../../ppcls/configs/ImageNet/PPLCNet/PPLCNet_x2_5.yaml) | 74.93% | - |
-| DML | PPLCNet_x2_5 | [PPLCNet_x2_5_dml.yaml](../../../../ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_dml.yaml) | 76.68%(**+1.75%**) | - |
+| baseline | PPLCNet_x2_5 | [PPLCNet_x2_5.yaml](../../../ppcls/configs/ImageNet/PPLCNet/PPLCNet_x2_5.yaml) | 74.93% | - |
+| DML | PPLCNet_x2_5 | [PPLCNet_x2_5_dml.yaml](../../../ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_dml.yaml) | 76.68%(**+1.75%**) | - |
 
 
 * Note: Complete PPLCNet_x2_5 The model have been trained for 360 epochs. For comparison, both baseline and DML have been trained for 100 epochs. Therefore, the accuracy is lower than the model (76.60%) opened on the official website.
@@ -210,8 +210,8 @@ Performance on ImageNet1k is shown below.
 
 | Strategy | Backbone | Config | Top-1 acc | Download Link |
 | --- | --- | --- | --- | --- |
-| baseline | PPLCNet_x2_5 | [PPLCNet_x2_5.yaml](../../../../ppcls/configs/ImageNet/PPLCNet/PPLCNet_x2_5.yaml) | 74.93% | - |
-| UDML | PPLCNet_x2_5 | [PPLCNet_x2_5_dml.yaml](../../../../ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_udml.yaml) | 76.74%(**+1.81%**) | - |
+| baseline | PPLCNet_x2_5 | [PPLCNet_x2_5.yaml](../../../ppcls/configs/ImageNet/PPLCNet/PPLCNet_x2_5.yaml) | 74.93% | - |
+| UDML | PPLCNet_x2_5 | [PPLCNet_x2_5_dml.yaml](../../../ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_udml.yaml) | 76.74%(**+1.81%**) | - |
 
 
 ##### 1.2.3.2 Configuration of UDML
@@ -262,7 +262,10 @@ Loss:
         weight: 1.0
 ```
 
-**Note(:** `return_patterns` are specified in the network above. The function of returning middle layer features is based on TheseusLayer. For more information about usage of TheseusLayer, please refer to: [Usage of TheseusLayer](theseus_layer.md).
+**Note(:** `return_patterns` are specified in the network above. The function of returning middle layer features is based on TheseusLayer.
+
+<!-- TODO(gaotingquan) -->
+<!-- For more information about usage of TheseusLayer, please refer to: [Usage of TheseusLayer](theseus_layer.md). -->
 
 
 <a name='1.2.4'></a>
@@ -286,8 +289,8 @@ Performance on ImageNet1k is shown below.
 
 | Strategy | Backbone | Config | Top-1 acc | Download Link |
 | --- | --- | --- | --- | --- |
-| baseline | ResNet18 | [ResNet18.yaml](../../../../ppcls/configs/ImageNet/ResNet/ResNet18.yaml) | 70.8% | - |
-| AFD | ResNet18 | [resnet34_distill_resnet18_afd.yaml](../../../../ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_afd.yaml) | 71.68%(**+0.88%**) | - |
+| baseline | ResNet18 | [ResNet18.yaml](../../../ppcls/configs/ImageNet/ResNet/ResNet18.yaml) | 70.8% | - |
+| AFD | ResNet18 | [resnet34_distill_resnet18_afd.yaml](../../../ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_afd.yaml) | 71.68%(**+0.88%**) | - |
 
 Note: In order to keep alignment with the training configuration in the paper, the number of training iterations is set to be 100 epochs, so the baseline accuracy is lower than the open source model accuracy in PaddleClas (71.0%).
 
@@ -374,7 +377,10 @@ Loss:
         weight: 1.0
 ```
 
-**Note(:** `return_patterns` are specified in the network above. The function of returning middle layer features is based on TheseusLayer. For more information about usage of TheseusLayer, please refer to: [Usage of TheseusLayer](theseus_layer.md).
+**Note(:** `return_patterns` are specified in the network above. The function of returning middle layer features is based on TheseusLayer.
+
+<!-- TODO(gaotingquan) -->
+<!-- For more information about usage of TheseusLayer, please refer to: [Usage of TheseusLayer](theseus_layer.md). -->
 
 <a name='1.2.5'></a>
 
@@ -397,8 +403,8 @@ Performance on ImageNet1k is shown below.
 
 | Strategy | Backbone | Config | Top-1 acc | Download Link |
 | --- | --- | --- | --- | --- |
-| baseline | ResNet18 | [ResNet18.yaml](../../../../ppcls/configs/ImageNet/ResNet/ResNet18.yaml) | 70.8% | - |
-| DKD | ResNet18 | [resnet34_distill_resnet18_dkd.yaml](../../../../ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_dkd.yaml) | 72.59%(**+1.79%**) | - |
+| baseline | ResNet18 | [ResNet18.yaml](../../../ppcls/configs/ImageNet/ResNet/ResNet18.yaml) | 70.8% | - |
+| DKD | ResNet18 | [resnet34_distill_resnet18_dkd.yaml](../../../ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_dkd.yaml) | 72.59%(**+1.79%**) | - |
 
 
 ##### 1.2.5.2 Configuration of DKD
@@ -465,8 +471,8 @@ Performance on ImageNet1k is shown below.
 
 | Strategy | Backbone | Config | Top-1 acc | Download Link |
 | --- | --- | --- | --- | --- |
-| baseline | ResNet18 | [ResNet18.yaml](../../../../ppcls/configs/ImageNet/ResNet/ResNet18.yaml) | 70.8% | - |
-| DIST | ResNet18 | [resnet34_distill_resnet18_dist.yaml](../../../../ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_dist.yaml) | 71.99%(**+1.19%**) | - |
+| baseline | ResNet18 | [ResNet18.yaml](../../../ppcls/configs/ImageNet/ResNet/ResNet18.yaml) | 70.8% | - |
+| DIST | ResNet18 | [resnet34_distill_resnet18_dist.yaml](../../../ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_dist.yaml) | 71.99%(**+1.19%**) | - |
 
 
 ##### 1.2.6.2 Configuration of DIST
@@ -531,8 +537,8 @@ Performance on ImageNet1k is shown below.
 
 | Strategy | Backbone | Config | Top-1 acc | Download Link |
 | --- | --- | --- | --- | --- |
-| baseline | ResNet18 | [ResNet18.yaml](../../../../ppcls/configs/ImageNet/ResNet/ResNet18.yaml) | 70.8% | - |
-| MGD | ResNet18 | [resnet34_distill_resnet18_mgd.yaml](../../../../ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_mgd.yaml) | 71.86%(**+1.06%**) | - |
+| baseline | ResNet18 | [ResNet18.yaml](../../../ppcls/configs/ImageNet/ResNet/ResNet18.yaml) | 70.8% | - |
+| MGD | ResNet18 | [resnet34_distill_resnet18_mgd.yaml](../../../ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_mgd.yaml) | 71.86%(**+1.06%**) | - |
 
 
 ##### 1.2.7.2 Configuration of MGD
@@ -603,8 +609,8 @@ Performance on ImageNet1k is shown below.
 
 | Strategy | Backbone | Config | Top-1 acc | Download Link |
 | --- | --- | --- | --- | --- |
-| baseline | ResNet18 | [ResNet18.yaml](../../../../ppcls/configs/ImageNet/ResNet/ResNet18.yaml) | 70.8% | - |
-| WSL | ResNet18 | [resnet34_distill_resnet18_wsl.yaml](../../../../ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_wsl.yaml) | 72.23%(**+1.43%**) | - |
+| baseline | ResNet18 | [ResNet18.yaml](../../../ppcls/configs/ImageNet/ResNet/ResNet18.yaml) | 70.8% | - |
+| WSL | ResNet18 | [resnet34_distill_resnet18_wsl.yaml](../../../ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_wsl.yaml) | 72.23%(**+1.43%**) | - |
 
 
 ##### 1.2.8.2 Configuration of WSL
@@ -657,7 +663,7 @@ Loss:
 
 ### 2.1 Environment Configuration
 
-* Installation: Please refer to [Paddle Installation Tutorial](../installation/install_paddle.md) and [PaddleClas Installation Tutorial](../../installation.md) to configure the running environment.
+* Installation: Please refer to [Paddle Installation Tutorial](../installation/install_paddle_en.md) and [PaddleClas Installation Tutorial](../installation/install_paddleclas_en.md) to configure the running environment.
 
 <a name="2.2"></a>
 
@@ -699,7 +705,7 @@ cat train_list.txt train_list_unlabel.txt > train_list_all.txt
 
 **Note:**
 
-* For more information about the format of `train_list.txt` and `val_list.txt`, you may refer to [Format Description of PaddleClas Classification Dataset](../single_label_classification/dataset.md#1-数据集格式说明) .
+* For more information about the format of `train_list.txt` and `val_list.txt`, you may refer to [Format Description of PaddleClas Classification Dataset](../data_preparation/classification_dataset_en.md#1dataset-format) .
 
 
 <a name="2.3"></a>
@@ -707,7 +713,7 @@ cat train_list.txt train_list_unlabel.txt > train_list_all.txt
 ### 2.3 Model Training
 
 
-In this section, the process of model training, evaluation and prediction of knowledge distillation algorithm will be introduced using the SSLD knowledge distillation algorithm as an example. The configuration file is [PPLCNet_x2_5_ssld.yaml](../../../../ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_ssld.yaml). You can use the following command to complete the model training.
+In this section, the process of model training, evaluation and prediction of knowledge distillation algorithm will be introduced using the SSLD knowledge distillation algorithm as an example. The configuration file is [PPLCNet_x2_5_ssld.yaml](../../../ppcls/configs/ImageNet/Distillation/PPLCNet_x2_5_ssld.yaml). You can use the following command to complete the model training.
 
 
 ```shell
@@ -776,7 +782,7 @@ python3 tools/export_model.py \
 
 3 files will be generated in `inference` directory: `inference.pdiparams`, `inference.pdiparams.info` and `inference.pdmodel`.
 
-For more information about model inference, please refer to: [Python Inference](../../deployment/image_classification/python.md).
+For more information about model inference, please refer to: [Python Inference](../inference_deployment/python_deploy_en.md).
 
 
 <a name="3"></a>
diff --git a/docs/en/image_recognition_pipeline/feature_extraction_en.md b/docs/en/image_recognition_pipeline/feature_extraction_en.md
index a3809b8afe..db2926863e 100644
--- a/docs/en/image_recognition_pipeline/feature_extraction_en.md
+++ b/docs/en/image_recognition_pipeline/feature_extraction_en.md
@@ -39,7 +39,7 @@ Functions of the above modules :
 
 #### 3.1 Backbone
 
-The Backbone part adopts [PP-LCNetV2_base](../models/PP-LCNetV2.md), which is based on `PPLCNet_V1`, including Rep strategy, PW convolution, Shortcut, activation function improvement, SE module improvement After several optimization points, the final classification accuracy is similar to `PPLCNet_x2_5`, and the inference delay is reduced by 40%<sup>*</sup>. During the experiment, we made appropriate improvements to `PPLCNetV2_base`, so that it can achieve higher performance in recognition tasks while keeping the speed basically unchanged, including: removing `ReLU` and ` at the end of `PPLCNetV2_base` FC`, change the stride of the last stage (RepDepthwiseSeparable) to 1.
+The Backbone part adopts [PP-LCNetV2_base](../models/PP-LCNetV2_en.md), which is based on `PPLCNet_V1`, including Rep strategy, PW convolution, Shortcut, activation function improvement, SE module improvement After several optimization points, the final classification accuracy is similar to `PPLCNet_x2_5`, and the inference delay is reduced by 40%<sup>*</sup>. During the experiment, we made appropriate improvements to `PPLCNetV2_base`, so that it can achieve higher performance in recognition tasks while keeping the speed basically unchanged, including: removing `ReLU` and ` at the end of `PPLCNetV2_base` FC`, change the stride of the last stage (RepDepthwiseSeparable) to 1.
 
 **Note:** <sup>*</sup>The inference environment is based on Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz hardware platform, OpenVINO inference platform.
 
diff --git a/docs/en/quick_start/quick_start_recognition_en.md b/docs/en/quick_start/quick_start_recognition_en.md
index 1d93728de0..cc84ccf48f 100644
--- a/docs/en/quick_start/quick_start_recognition_en.md
+++ b/docs/en/quick_start/quick_start_recognition_en.md
@@ -124,7 +124,10 @@ Note: Since some decompression software has problems in decompressing the above
 
 The demo data download path of this chapter is as follows: [drink_dataset_v2.0.tar (drink data)](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar),
 
-The following takes **drink_dataset_v2.0.tar** as an example to introduce the PP-ShiTu quick start process on the PC. Users can also download and decompress the data of other scenarios to experience: [22 scenarios data download](../../zh_CN/introduction/ppshitu_application_scenarios.md#22-下载解压场景库数据).
+The following takes **drink_dataset_v2.0.tar** as an example to introduce the PP-ShiTu quick start process on the PC.
+
+<!-- TODO -->
+<!-- Users can also download and decompress the data of other scenarios to experience: [22 scenarios data download](../../zh_CN/introduction/ppshitu_application_scenarios.md#22-下载解压场景库数据). -->
 
 If you want to experience the server object detection and the recognition model of each scene, you can refer to [2.4 Server recognition model list](#24-list-of-server-identification-models)
 

From 58cd08c42ee00dca1b6455b5013abd8c0920fcf5 Mon Sep 17 00:00:00 2001
From: gaotingquan <gaotingquan@baidu.com>
Date: Tue, 1 Nov 2022 03:11:19 +0000
Subject: [PATCH 6/9] docs: fix the error links

---
 README_ch.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README_ch.md b/README_ch.md
index c2311f9f64..d6b093627f 100644
--- a/README_ch.md
+++ b/README_ch.md
@@ -95,8 +95,8 @@ PaddleClas支持多种前沿图像分类、识别相关算法，发布产业级
     - [向量检索](docs/zh_CN/deployment/PP-ShiTu/vector_search.md)
     - [哈希编码](docs/zh_CN/training/PP-ShiTu/deep_hashing.md)
   - PipeLine 推理部署
-    - [基于python预测引擎推理](docs/zh_CN/deployment/image_classification/python.md#2)
-    - [基于C++预测引擎推理](deploy/cpp_shitu/readme.md)
+    - [基于python预测引擎推理](docs/zh_CN/deployment/PP-ShiTu/python.md)
+    - [基于C++预测引擎推理](docs/zh_CN/deployment/PP-ShiTu/cpp.md)
     - [服务化部署](docs/zh_CN/deployment/PP-ShiTu/paddle_serving.md)
     - [端侧部署](docs/zh_CN/deployment/PP-ShiTu/paddle_lite.md)
     - [库管理工具](docs/zh_CN/deployment/PP-ShiTu/gallery_manager.md)

From 4891fe34a08a99641c257f0f56931f145c047f58 Mon Sep 17 00:00:00 2001
From: gaotingquan <gaotingquan@baidu.com>
Date: Tue, 1 Nov 2022 11:57:06 +0000
Subject: [PATCH 7/9] docs: update

---
 docs/en/models/PP-LCNet_en.md | 501 ++++++++++++++++++++++++++--------
 1 file changed, 386 insertions(+), 115 deletions(-)

diff --git a/docs/en/models/PP-LCNet_en.md b/docs/en/models/PP-LCNet_en.md
index 12d43c9e15..652af892b5 100644
--- a/docs/en/models/PP-LCNet_en.md
+++ b/docs/en/models/PP-LCNet_en.md
@@ -1,95 +1,57 @@
 # PP-LCNet Series
----
-
-
-## Catalogue
-
-- [1. Abstract](#1)
-- [2. Introduction](#2)
-- [3. Method](#3)
-   - [3.1 Better Activation Function](#3.1)
-   - [3.2 SE Modules at Appropriate Positions](#3.2)
-   - [3.3 Larger Convolution Kernels](#3.3)
-   - [3.4 Larger Dimensional 1 × 1 Conv Layer after GAP](#3.4)
-- [4. Experiments](#4)
-   - [4.1 Image Classification](#4.1)
-   - [4.2 Object Detection](#4.2)
-   - [4.3 Semantic Segmentation](#4.3)
-- [5. Inference speed based on V100 GPU](#5)
-- [6. Inference speed based on SD855](#6)
-- [7. Conclusion](#7)
-- [8. Reference](#8)
-
-<a name="1"></a>
-## 1. Abstract
-
-In the field of computer vision, the quality of backbone network determines the outcome of the whole vision task. In previous studies, researchers generally focus on the optimization of FLOPs or Params, but inference speed actually serves as an importance indicator of model quality in real-world scenarios. Nevertheless, it is difficult to balance inference speed and accuracy. In view of various CPU-based applications in industry, we are now working to raise the adaptability of the backbone network to Intel CPU, so as to obtain a faster and more accurate lightweight backbone network. At the same time, the performance of downstream vision tasks such as object detection and semantic segmentation are also improved.
-
-<a name="2"></a>
-## 2. Introduction
 
-Recent years witnessed the emergence of many lightweight backbone networks. In past two years, in particular, there were abundant networks searched by NAS that either enjoy advantages on FLOPs or Params, or have an edge in terms of inference speed on ARM devices. However, few of them dedicated to specified optimization of Intel CPU, resulting their imperfect inference speed on the intel CPU side. Based on this, we specially design the backbone network PP-LCNet for Intel CPU devices with its acceleration library MKLDNN. Compared with other lightweight SOTA models, this backbone network can further improve the performance of the model without increasing the inference time, significantly outperforming the existing SOTA models. A comparison chart with other models is shown below.
-![](../../images/PP-LCNet/PP-LCNet-Acc.png)
+---
 
-<a name="3"></a>
-## 3. Method
+- [1. Introduction](#1)
+    - [1.1 Model Introduction](#1.1)
+    - [1.2 Model Details](#1.2)
+    - [1.3 Result](#1.3)
+- [2. Quick Start](#2)
+    - [2.1 PaddlePaddle Installation](#2.1)
+    - [2.2 PaddleClas Installation](#2.2)
+    - [2.3 Prediction](#2.3)
+- [3. Training, Evaluation and Inference](#3)
+    - [3.1 Installation](#3.1)
+    - [3.2 Dataset](#3.2)
+    - [3.3 Training](#3.3)
+      - [3.3.1 Train ImageNet](#3.3.1)
+      - [3.3.2 Fine-tuning based on ImageNet weights](#3.3.2)
+    - [3.4 Evaluation](#3.4)
+    - [3.5 Inference](#3.5)
+- [4. Inference Deployment](#4)
+  - [4.1 Getting Paddle Inference Model](#4.1)
+    - [4.1.1 Exporting Paddle Inference Model](#4.1.1)
+    - [4.1.2 Downloading Inference Model](#4.1.2)
+  - [4.2 Prediction with Python](#4.2)
+    - [4.2.1 Image Prediction](#4.2.1)
+    - [4.2.2 Images Prediction](#4.2.2)
+  - [4.3 Deployment with C++](#4.3)
+  - [4.4 Deployment as Service](#4.4)
+  - [4.5 Deployment on Mobile](#4.5)
+  - [4.6 Converting To ONNX and Deployment](#4.6)
+- [4. Reference](#5)
+
+<a name='1'></a>
+
+## 1. Introduction
+
+Recent years witnessed the emergence of many lightweight backbone networks. In past two years, in particular, there were abundant networks searched by NAS that either enjoy advantages on FLOPs or Params, or have an edge in terms of inference speed on ARM devices. However, few of them dedicated to specified optimization of Intel CPU, resulting their imperfect inference speed on the intel CPU side. Based on this, we specially design the backbone network PP-LCNet for Intel CPU devices with its acceleration library MKLDNN. Compared with other lightweight SOTA models, this backbone network can further improve the performance of the model without increasing the inference time, significantly outperforming the existing SOTA models.
+
+<a name='1.2'></a>
+
+### 1.2 Model Details
+
+Build on extensive experiments, we found that many seemingly less time-consuming operations will increase the latency on Intel CPU-based devices, especially when the MKLDNN acceleration library is enabled. Finally, we summarized some strategies that can improve the accuracy of the model without increasing the latency and combined these four strategies to form PP-LCNet.
 
 The overall structure of the network is shown in the figure below.
 ![](../../images/PP-LCNet/PP-LCNet.png)
 
-Build on extensive experiments, we found that many seemingly less time-consuming operations will increase the latency  on Intel CPU-based devices, especially when the MKLDNN acceleration library is enabled. Therefore, we finally chose a block with the leanest possible structure and the fastest possible speed to form our BaseNet (similar to MobileNetV1). Based on BaseNet, we summarized four strategies that can improve the accuracy of the model without increasing the latency, and we combined these four strategies to form PP-LCNet. Each of these four strategies is introduced as below:
-
-<a name="3.1"></a>
-### 3.1 Better Activation Function
-
-Since the adoption of ReLU activation function by convolutional neural network, the network performance has been improved substantially, and variants of the ReLU activation function have appeared in recent years, such as Leaky-ReLU, P-ReLU, ELU, etc. In 2017, Google Brain searched to obtain the swish activation function, which performs well on lightweight networks. In 2019, the authors of MobileNetV3 further optimized this activation function to H-Swish, which removes the exponential operation, leading to faster speed and an almost unaffected network accuracy. After many experiments, we also recognized its excellent performance on lightweight networks. Therefore, this activation function is adopted in PP-LCNet.
-
-<a name="3.2"></a>
-### 3.2 SE Modules at Appropriate Positions
-
-The SE module is a channel attention mechanism proposed by SENet, which can effectively improve the accuracy of the model. However, on the Intel CPU side, the module also presents a large latency, leaving us the task of balancing accuracy and speed. The search of the location of the SE module in NAS search-based networks such as MobileNetV3 brings no general conclusions, but we found through our experiments that the closer the SE module is to the tail of the network the greater the improvement in model accuracy. The following table also shows some of our experimental results：
+<a name='1.3'></a>
 
-| SE Location       | Top-1 Acc(\%) | Latency(ms) |
-|-------------------|---------------|-------------|
-| 1100000000000     | 61.73           | 2.06         |
-| 0000001100000     | 62.17           | 2.03         |
-| <b>0000000000011<b>     | <b>63.14<b>           | <b>2.05<b>         |
-| 1111111111111     | 64.27           | 3.80         |
+### 1.3 Result
 
-The option in the third row of the table was chosen for the location of the SE module in PP-LCNet.
-
-<a name="3.3"></a>
-### 3.3 Larger Convolution Kernels
-
-In the paper of MixNet, the author analyzes the effect of convolutional kernel size on model performance and concludes that larger convolutional kernels within a certain range can improve the performance of the model, but beyond this range will be detrimental to the model’s performance. So the author forms MixConv with split-concat paradigm combined, which can improve the performance of the model but is not conducive to inference. We experimentally summarize the role of some larger convolutional kernels at different positions that are similar to those of the SE module, and find that larger convolutional kernels display more prominent roles in the middle and tail of the network. The following table shows the effect of the position of the 5x5 convolutional kernels on the accuracy：
-
-| Larger Convolution Location       | Top-1 Acc(\%) | Latency(ms) |
-|----------------------------|---------------|-------------|
-| 1111111111111     | 63.22           | 2.08         |
-| 1111111000000     | 62.70           | 2.07        |
-| <b>0000001111111<b>     | <b>63.14<b>           | <b>2.05<b>         |
-
-
-Experiments show that a larger convolutional kernel placed at the middle and tail of the network can achieve the same accuracy as placed at all positions, coupled with faster inference. The option in the third row of the table was the final choice of PP-LCNet.
-
-<a name="3.4"></a>
-### 3.4 Larger Dimensional 1 × 1 Conv Layer after GAP
-
-Since the introduction of GoogLeNet, GAP (Global-Average-Pooling) is often directly followed by a classification layer, which fails to result in further integration and processing of features extracted after GAP in the lightweight network. If a larger 1x1 convolutional layer (equivalent to the FC layer) is used after GAP, the extracted features, instead of directly passing through the classification layer, will first be integrated, and then classified. This can greatly improve the accuracy rate without affecting the inference speed of the model. The above four improvements were made to BaseNet to obtain PP-LCNet. The following table further illustrates the impact of each scheme on the results：
-
-| Activation | SE-block | Large-kernal | last-1x1-conv | Top-1 Acc(\%) | Latency(ms) |
-|------------|----------|--------------|---------------|---------------|-------------|
-| 0       | 1       | 1               | 1                | 61.93 | 1.94 |
-| 1       | 0       | 1               | 1                | 62.51 | 1.87 |
-| 1       | 1       | 0               | 1                | 62.44 | 2.01 |
-| 1       | 1       | 1               | 0                | 59.91 | 1.85 |
-| <b>1<b>       | <b>1<b>       | <b>1<b>               | <b>1<b>                | <b>63.14<b> | <b>2.05<b> |
-
-<a name="4"></a>
-## 4. Experiments
-
-<a name="4.1"></a>
-### 4.1 Image Classification
+<a name="1.3.1"></a>
+### 1.3.1 Image Classification
 
 For image classification, ImageNet dataset is adopted. Compared with the current mainstream lightweight network, PP-LCNet can obtain faster inference speed with the same accuracy. When using Baidu’s self-developed SSLD distillation strategy, the accuracy is further improved, with the Top-1 Acc of ImageNet exceeding 80% at an inference speed of about 5ms on the Intel CPU side.
 
@@ -126,8 +88,36 @@ Performance comparison with other lightweight networks:
 | MobileNetV3_small_x1_25  | 3.6 | 100  | 70.67 | 89.51 | 3.95 |
 | <b>PPLCNet_x1_0<b>     |<b> 3.0<b> | <b>161<b> | <b>71.32<b> | <b>90.03<b> | <b>2.46<b> |
 
-<a name="4.2"></a>
-### 4.2 Object Detection
+We also test the inference speed of PPLCNet on other devices:
+
+* Inference speed based on V100 GPU
+
+| Models        | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) | FP32<br/>Batch Size=1\4<br/>(ms) | FP32<br/>Batch Size=8<br/>(ms) |
+| ------------- | --------- | ----------------- | ---------------------------- | -------------------------------- | ------------------------------ |
+| PPLCNet_x0_25 | 224       | 256               | 0.72                         | 1.17                             | 1.71                           |
+| PPLCNet_x0_35 | 224       | 256               | 0.69                         | 1.21                             | 1.82                           |
+| PPLCNet_x0_5  | 224       | 256               | 0.70                         | 1.32                             | 1.94                           |
+| PPLCNet_x0_75 | 224       | 256               | 0.71                         | 1.49                             | 2.19                           |
+| PPLCNet_x1_0  | 224       | 256               | 0.73                         | 1.64                             | 2.53                           |
+| PPLCNet_x1_5  | 224       | 256               | 0.82                         | 2.06                             | 3.12                           |
+| PPLCNet_x2_0  | 224       | 256               | 0.94                         | 2.58                             | 4.08                           |
+
+* Inference speed based on SD855
+
+| Models        | SD855 time(ms)<br>bs=1, thread=1 | SD855 time(ms)<br/>bs=1, thread=2 | SD855 time(ms)<br/>bs=1, thread=4 |
+| ------------- | -------------------------------- | --------------------------------- | --------------------------------- |
+| PPLCNet_x0_25 | 2.30                             | 1.62                              | 1.32                              |
+| PPLCNet_x0_35 | 3.15                             | 2.11                              | 1.64                              |
+| PPLCNet_x0_5  | 4.27                             | 2.73                              | 1.92                              |
+| PPLCNet_x0_75 | 7.38                             | 4.51                              | 2.91                              |
+| PPLCNet_x1_0  | 10.78                            | 6.49                              | 3.98                              |
+| PPLCNet_x1_5  | 20.55                            | 12.26                             | 7.54                              |
+| PPLCNet_x2_0  | 33.79                            | 20.17                             | 12.10                             |
+| PPLCNet_x2_5  | 49.89                            | 29.60                             | 17.82                             |
+
+<a name="1.3.2"></a>
+
+### 1.3.2 Object Detection
 
 For object detection, we adopt Baidu’s self-developed PicoDet, which focuses on lightweight object detection scenarios. The following table shows the comparison between the results of PP-LCNet and MobileNetV3 on the COCO dataset. PP-LCNet has an obvious advantage in both accuracy and speed.
 
@@ -138,8 +128,9 @@ MobileNetV3_large_x0_35 | 19.2 | 8.1 |
 MobileNetV3_large_x0_75 | 25.8 | 11.1 |
 <b>PPLCNet_x1_0<b> | <b>26.9<b> | <b>7.9<b> |
 
-<a name="4.3"></a>
-### 4.3 Semantic Segmentation
+<a name="1.3.3"></a>
+
+### 1.3.3 Semantic Segmentation
 
 For semantic segmentation, DeeplabV3+ is adopted. The following table presents the comparison between PP-LCNet and MobileNetV3 on the Cityscapes dataset, and PP-LCNet also stands out in terms of accuracy and speed.
 
@@ -150,42 +141,322 @@ MobileNetV3_large_x0_5 | 55.42 | 135 |
 MobileNetV3_large_x0_75 | 64.53 | 151 |
 <b>PPLCNet_x1_0<b> | <b>66.03<b> | <b>96<b> |
 
-<a name="5"></a>
-## 5. Inference speed based on V100 GPU
 
-| Models        | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) | FP32<br/>Batch Size=1\4<br/>(ms) | FP32<br/>Batch Size=8<br/>(ms) |
-| ------------- | --------- | ----------------- | ---------------------------- | -------------------------------- | ------------------------------ |
-| PPLCNet_x0_25 | 224       | 256               | 0.72                         | 1.17                             | 1.71                           |
-| PPLCNet_x0_35 | 224       | 256               | 0.69                         | 1.21                             | 1.82                           |
-| PPLCNet_x0_5  | 224       | 256               | 0.70                         | 1.32                             | 1.94                           |
-| PPLCNet_x0_75 | 224       | 256               | 0.71                         | 1.49                             | 2.19                           |
-| PPLCNet_x1_0  | 224       | 256               | 0.73                         | 1.64                             | 2.53                           |
-| PPLCNet_x1_5  | 224       | 256               | 0.82                         | 2.06                             | 3.12                           |
-| PPLCNet_x2_0  | 224       | 256               | 0.94                         | 2.58                             | 4.08                           |
+<a name="2"></a>
 
-<a name="6"></a>
+## 2. Quick Start
 
-## 6. Inference speed based on SD855
+<a name="2.1"></a>  
 
-| Models        | SD855 time(ms)<br>bs=1, thread=1 | SD855 time(ms)<br/>bs=1, thread=2 | SD855 time(ms)<br/>bs=1, thread=4 |
-| ------------- | -------------------------------- | --------------------------------- | --------------------------------- |
-| PPLCNet_x0_25 | 2.30                             | 1.62                              | 1.32                              |
-| PPLCNet_x0_35 | 3.15                             | 2.11                              | 1.64                              |
-| PPLCNet_x0_5  | 4.27                             | 2.73                              | 1.92                              |
-| PPLCNet_x0_75 | 7.38                             | 4.51                              | 2.91                              |
-| PPLCNet_x1_0  | 10.78                            | 6.49                              | 3.98                              |
-| PPLCNet_x1_5  | 20.55                            | 12.26                             | 7.54                              |
-| PPLCNet_x2_0  | 33.79                            | 20.17                             | 12.10                             |
-| PPLCNet_x2_5  | 49.89                            | 29.60                             | 17.82                             |
+### 2.1 PaddlePaddle Installation
+
+- Run the following command to install if CUDA9 or CUDA10 is available.
+
+```bash
+python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
+```
+
+- Run the following command to install if GPU device is unavailable.
+
+```bash
+python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+```
+
+Please refer to [PaddlePaddle Installation](https://www.paddlepaddle.org.cn/en/install/quick?docurl=/documentation/docs/en/install/pip/macos-pip_en.html) for more information about installation, for examples other versions.
+
+<a name="2.2"></a>  
+
+### 2.2 PaddleClas wheel Installation
+
+The command of PaddleClas installation as bellow:
+
+```  
+pip3 install paddleclas
+```
+
+<a name="2.3"></a>
+
+### 2.3 Prediction
+
+* Prediction with CLI
+
+```bash
+paddleclas --model_name=PPLCNet_x1_0  --infer_imgs="docs/images/inference_deployment/whl_demo.jpg"
+```
+
+Results:
+```
+>>> result
+class_ids: [8, 7, 86, 81, 85], scores: [0.91347, 0.03779, 0.0036, 0.00117, 0.00112], label_names: ['hen', 'cock', 'partridge', 'ptarmigan', 'quail'], filename: docs/images/inference_deployment/whl_demo.jpg
+Predict complete!
+```  
+
+**Note**: When replacing other scale models of PPLCNet, just replace `model_name`. For example, when changing the model at this time to `PPLCNet_x0_25`, you only need to change `--model_name=PPLCNet_x1_0` to `--model_name=PPLCNet_x0_25`.
+
+
+* Prediction in Python
+```python
+from paddleclas import PaddleClas
+clas = PaddleClas(model_name='PPLCNet_x1_0')
+infer_imgs = 'docs/images/deployment/whl_demo.jpg'
+result = clas.predict(infer_imgs)
+print(next(result))
+```
+
+**Note**: The result returned by model.predict() is a `generator`, so you need to use the `next()` function to call it or `for loop` to loop it. And it will predict with batch_size size batch and return the prediction results when called. The default batch_size is 1, and you also specify the batch_size when instantiating, such as `model = paddleclas.PaddleClas(model_name="PPLCNet_x1_0", batch_size=2)`.
+
+<a name="3"></a>
+
+## 3. Training, Evaluation and Inference
+
+<a name="3.1"></a>  
+
+### 3.1 Installation
+
+Please refer to [Installation](../installation/install_paddleclas_en.md) to get the description about installation.
+
+<a name="3.2"></a>
+
+### 3.2 Dataset
+
+Please prepare ImageNet-1k data at [ImageNet official website](https://www.image-net.org/).
+
+Enter the `PaddleClas/` directory:
 
+```
+cd path_to_PaddleClas
+```
+
+Enter the `dataset/` directory, name the downloaded data `ILSVRC2012` , and the `ILSVRC2012` directory has the following data:
+
+```
+├── train
+│   ├── n01440764
+│   │   ├── n01440764_10026.JPEG
+│   │   ├── n01440764_10027.JPEG
+├── train_list.txt
+...
+├── val
+│   ├── ILSVRC2012_val_00000001.JPEG
+│   ├── ILSVRC2012_val_00000002.JPEG
+├── val_list.txt
+```
+
+where `train/` and `val/` are the training set and validation set, respectively. `train_list.txt` and `val_list.txt` are the label files for the training set and validation set, respectively.
+
+**Note:**
+* About the contents format of `train_list.txt` and `val_list.txt`, please refer to [Description about Classification Dataset in PaddleClas](../data_preparation/classification_dataset_en.md).
+
+
+<a name="3.3"></a>
+
+### 3.3 Training
+
+<a name="3.3.1"></a>
+
+#### 3.3.1 Train ImageNet
+
+The PPLCNet_x1_0 training configuration is provided in `ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_0.yaml`, which can be started with the following script:  
+
+```shell
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_0.yaml
+```
+
+
+**Note:**
+
+* The current model with the best accuracy will be saved in `output/PPLCNet_x1_0/best_model.pdparams`
+
+<a name="3.3.2"></a>
+
+#### 3.3.2 Fine-tuning based on ImageNet weights
+
+If you are not training an ImageNet task, you need to change the configuration file and training method, such as reducing the learning rate, reducing the number of epochs, etc.
+
+<a name="3.4"></a>
+
+### 3.4 Evaluation
+
+After training, you can use the following commands to evaluate the model.
+
+```bash
+python3 tools/eval.py \
+    -c ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/PPLCNet_x1_0/best_model
+```
+Among the above command, the argument `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` specify the path of the best model weight file. You can specify other path if needed.
 
-<a name="7"></a>
-## 7. Conclusion
+<a name="3.5"></a>
+
+### 3.5 Inference
+
+After the model training is completed, the pre-trained model obtained from the training can be loaded for model prediction. A complete example is provided in the `tools/infer.py` of the model library, and the model prediction can be done by simply executing the following command:
+
+```python
+python3 tools/infer.py \
+    -c ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/PPLCNet_x1_0/best_model
+```
+
+The results:
+```
+[{'class_ids': [8, 7, 86, 81, 85], 'scores': [0.91347, 0.03779, 0.0036, 0.00117, 0.00112], 'file_name': 'docs/images/inference_deployment/whl_demo.jpg', 'label_names': ['hen', 'cock', 'partridge', 'ptarmigan', 'quail']}]
+```
+
+**Note**:
+
+* Among the above command, argument `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` specify the path of the best model weight file. You can specify other path if needed.
+
+
+* The default test image is `docs/images/inference_deployment/whl_demo.jpg` ，And you can test other image, only need to specify the argument `-o Infer.infer_imgs=path_to_test_image`.
+
+* The default output is the value of Top-5. If you want to output the value of Top-k, you can specify `-o Infer.PostProcess.topk=k`, where `k` is the value you specify.
+
+* The default label mapping is based on the ImageNet dataset. If you change the dataset, you need to re-specify `Infer.PostProcess.class_id_map_file`. For the method of making the mapping file, please refer to `ppcls/utils/imagenet1k_label_list.txt`
+
+
+<a name="4"></a>
+
+## 4. Inference Deployment
+
+<a name="4.1"></a>
+
+### 4.1 Getting Paddle Inference Model
+
+Paddle Inference is the original Inference Library of the PaddlePaddle, provides high-performance inference for server deployment. And compared with  directly based on the pretrained model, Paddle Inference can use tools to accelerate prediction, so as to achieve better inference performance. Please refer to [Paddle Inference](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html) for more information.
+
+Paddle Inference need Paddle Inference Model to predict. Two process provided to get Paddle Inference Model. If want to use the provided by PaddleClas, you can download directly, click [Downloading Inference Model](#4.1.2).
+
+
+<a name="4.1.1"></a>
+
+### 4.1.1 Exporting Paddle Inference Model
+
+The command about exporting Paddle Inference Model is as follow:
+
+```bash
+python3 tools/export_model.py \
+    -c ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/PPLCNet_x1_0/best_model \
+    -o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_infer
+```
 
-Rather than holding on to perfect FLOPs and Params as academics do, PP-LCNet focuses on analyzing how to add Intel CPU-friendly modules to improve the performance of the model, which can better balance accuracy and inference time. The experimental conclusions therein are available to other researchers in network structure design, while providing NAS search researchers with a smaller search space and general conclusions. The finished PP-LCNet can also be better accepted and applied in industry.
+After running above command, the inference model files would be saved in `deploy/models/PPLCNet_x1_0_infer`, as shown below:
 
-<a name="8"></a>
-## 8. Reference
+```
+├── PPLCNet_x1_0_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+
+<a name="4.1.2"></a>
+
+### 4.1.2 Downloading Inference Model
+
+You can also download directly.
+
+```
+cd deploy/models
+# download the inference model and decompression
+wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x1_0_infer.tar && tar -xf PPLCNet_x1_0_infer.tar
+```
+
+After decompression, the directory `models` should be shown below.
+```
+├── PPLCNet_x1_0_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+```
+
+<a name="4.2"></a>
+
+### 4.2 Prediction with Python
+
+
+<a name="4.2.1"></a>  
+
+#### 4.2.1 Image Prediction
+
+Return the directory `deploy`:
+
+```
+cd ../
+```
+
+Run the following command to classify whether there are humans in the image `./images/ImageNet/ILSVRC2012_val_00000010.jpeg`.
+
+```shell
+# Use the following command to predict with GPU.
+python3 python/predict_cls.py -c configs/inference_cls.yaml -o Global.inference_model_dir=models/PPLCNet_x1_0_infer
+# Use the following command to predict with CPU.
+python3 python/predict_cls.py -c configs/inference_cls.yaml -o Global.inference_model_dir=models/PPLCNet_x1_0_infer -o Global.use_gpu=False
+```
+
+The prediction results:
+
+```
+ILSVRC2012_val_00000010.jpeg:   class id(s): [153, 265, 204, 283, 229], score(s): [0.61, 0.11, 0.05, 0.03, 0.02], label_name(s): ['Maltese dog, Maltese terrier, Maltese', 'toy poodle', 'Lhasa, Lhasa apso', 'Persian cat', 'Old English sheepdog, bobtail']
+```
+
+<a name="4.2.2"></a>  
+
+#### 4.2.2 Images Prediction
+
+If you want to predict images in directory, please specify the argument `Global.infer_imgs` as directory path by `-o Global.infer_imgs`. The command is as follow.
+
+```shell
+# Use the following command to predict with GPU. If want to replace with CPU, you can add argument -o Global.use_gpu=False
+python3 python/predict_cls.py -c configs/inference_cls.yaml -o Global.inference_model_dir=models/PPLCNet_x1_0_infer -o Global.infer_imgs=images/ImageNet/
+```
+
+终端中会输出该文件夹内所有图像的分类结果，如下所示。
+
+```
+ILSVRC2012_val_00000010.jpeg:   class id(s): [153, 265, 204, 283, 229], score(s): [0.61, 0.11, 0.05, 0.03, 0.02], label_name(s): ['Maltese dog, Maltese terrier, Maltese', 'toy poodle', 'Lhasa, Lhasa apso', 'Persian cat', 'Old English sheepdog, bobtail']
+ILSVRC2012_val_00010010.jpeg:   class id(s): [695, 551, 507, 531, 419], score(s): [0.11, 0.06, 0.03, 0.03, 0.03], label_name(s): ['padlock', 'face powder', 'combination lock', 'digital watch', 'Band Aid']
+ILSVRC2012_val_00020010.jpeg:   class id(s): [178, 211, 209, 210, 236], score(s): [0.87, 0.03, 0.01, 0.00, 0.00], label_name(s): ['Weimaraner', 'vizsla, Hungarian pointer', 'Chesapeake Bay retriever', 'German short-haired pointer', 'Doberman, Doberman pinscher']
+ILSVRC2012_val_00030010.jpeg:   class id(s): [80, 23, 93, 81, 99], score(s): [0.87, 0.01, 0.01, 0.01, 0.00], label_name(s): ['black grouse', 'vulture', 'hornbill', 'ptarmigan', 'goose']
+```
+
+<a name="4.3"></a>
+
+### 4.3 Deployment with C++
+
+PaddleClas provides an example about how to deploy with C++. Please refer to [Deployment with C++](../inference_deployment/cpp_deploy_en.md).
+
+<a name="4.4"></a>
+
+### 4.4 Deployment as Service
+
+Paddle Serving is a flexible, high-performance carrier for machine learning models, and supports different protocol, such as RESTful, gRPC, bRPC and so on, which provides different deployment solutions for a variety of heterogeneous hardware and operating system environments. Please refer [Paddle Serving](https://github.com/PaddlePaddle/Serving) for more information.
+
+PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/paddle_serving_deploy_en.md).
+
+<a name="4.5"></a>
+
+### 4.5 Deployment on Mobile
+
+Paddle-Lite is an open source deep learning framework that designed to make easy to perform inference on mobile, embeded, and IoT devices. Please refer to [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) for more information.
+
+PaddleClas provides an example of how to deploy on mobile by Paddle-Lite. Please refer to [Paddle-Lite deployment](../inference_deployment/paddle_lite_deploy_en.md).
+
+<a name="4.6"></a>
+
+### 4.6 Converting To ONNX and Deployment
+
+Paddle2ONNX support convert Paddle Inference model to ONNX model. And you can deploy with ONNX model on different inference engine, such as TensorRT, OpenVINO, MNN/TNN, NCNN and so on. About Paddle2ONNX details, please refer to [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX).
+
+PaddleClas provides an example of how to convert Paddle Inference model to ONNX model by paddle2onnx toolkit and predict by ONNX model. You can refer to [paddle2onnx](../../../deploy/paddle2onnx/readme_en.md) for deployment details.
+
+<a name="5"></a>
+## 5. Reference
 
 Reference to cite when you use PP-LCNet in a paper:
 ```

From 47755089698f20884a11e79dc3bcf8cc675f5b38 Mon Sep 17 00:00:00 2001
From: gaotingquan <gaotingquan@baidu.com>
Date: Tue, 1 Nov 2022 13:18:49 +0000
Subject: [PATCH 8/9] docs: fix

---
 docs/en/PPShiTu/PPShiTuV2_introduction.md |  2 +-
 docs/en/models/PP-HGNet_en.md             | 19 ++++++-------------
 docs/zh_CN/models/PP-ShiTu/README.md      |  2 +-
 3 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/docs/en/PPShiTu/PPShiTuV2_introduction.md b/docs/en/PPShiTu/PPShiTuV2_introduction.md
index 2ad62acdb4..d33b6c404f 100644
--- a/docs/en/PPShiTu/PPShiTuV2_introduction.md
+++ b/docs/en/PPShiTu/PPShiTuV2_introduction.md
@@ -163,7 +163,7 @@ cd deploy/models
 wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar && tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
 
 # Download the feature extraction inference model and unzip it
-wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar && tar -xf general_PPLCNetV2_base_pretrained_v1.
+wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar && tar -xf general_PPLCNetV2_base_pretrained_v1.tar
 ```
 
 ### 4.2 Test data preparation
diff --git a/docs/en/models/PP-HGNet_en.md b/docs/en/models/PP-HGNet_en.md
index 2babd1fdf5..3c664351ec 100644
--- a/docs/en/models/PP-HGNet_en.md
+++ b/docs/en/models/PP-HGNet_en.md
@@ -153,13 +153,7 @@ result = clas.predict(infer_imgs)
 print(next(result))
 ```
 
-**Note**: The result returned by model.predict() is a `generator`, so you need to use the `next()` function to call it or `for loop` to loop it. And it will predict with batch_size size batch and return the prediction results when called. The default batch_size is 1, and you also specify the batch_size when instantiating, such as model = paddleclas.PaddleClas(model_name="PPHGNet_small", batch_size=2). The result of demo above:
-
-
-```
->>> result
-[{'class_ids': [8, 7, 86, 82, 81], 'scores': [0.71479, 0.08682, 0.00806, 0.0023, 0.00121], 'label_names': ['hen', 'cock', 'partridge', 'ruffed grouse, partridge, Bonasa umbellus', 'ptarmigan'], 'filename': 'docs/images/inference_deployment/whl_demo.jpg'}]
-```
+**Note**: The result returned by model.predict() is a `generator`, so you need to use the `next()` function to call it or `for loop` to loop it. And it will predict with batch_size size batch and return the prediction results when called. The default batch_size is 1, and you also specify the batch_size when instantiating, such as `model = paddleclas.PaddleClas(model_name="PPHGNet_small", batch_size=2)`.
 
 <a name="3"></a>
 
@@ -212,7 +206,7 @@ where `train/` and `val/` are the training set and validation set, respectively.
 
 #### 3.3.1 Train ImageNet
 
-The PPHGNet_small training configuration is provided in `ppcls/configs/ImageNet/PPHGNet/PPHGNet_small.yaml`, which can be started with the following script:    
+The PPHGNet_small training configuration is provided in `ppcls/configs/ImageNet/PPHGNet/PPHGNet_small.yaml`, which can be started with the following script:  
 
 ```shell
 export CUDA_VISIBLE_DEVICES=0,1,2,3
@@ -226,12 +220,12 @@ python3 -m paddle.distributed.launch \
 **Note:**
 
 * The current model with the best accuracy will be saved in `output/PPHGNet_small/best_model.pdparams`
-    
+
 <a name="3.3.2"></a>
 
 #### 3.3.2 Fine-tuning based on ImageNet weights
 
-If you are not training an ImageNet task, you need to change the configuration file and training method, such as reducing the learning rate, reducing the number of epochs, etc. 
+If you are not training an ImageNet task, you need to change the configuration file and training method, such as reducing the learning rate, reducing the number of epochs, etc.
 
 <a name="3.4"></a>
 
@@ -265,13 +259,13 @@ The results:
 
 **Note**:
 
-* Among the above command, argument `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` specify the path of the best model weight file. You can specify other path if needed.
+* Among the above command, argument `-o Global.pretrained_model="output/PPHGNet_small/best_model"` specify the path of the best model weight file. You can specify other path if needed.
 
 
 * The default test image is `docs/images/inference_deployment/whl_demo.jpg` ，And you can test other image, only need to specify the argument `-o Infer.infer_imgs=path_to_test_image`.
 
 * The default output is the value of Top-5. If you want to output the value of Top-k, you can specify `-o Infer.PostProcess.topk=k`, where `k` is the value you specify.
-   
+
 * The default label mapping is based on the ImageNet dataset. If you change the dataset, you need to re-specify `Infer.PostProcess.class_id_map_file`. For the method of making the mapping file, please refer to `ppcls/utils/imagenet1k_label_list.txt`
 
 
@@ -410,4 +404,3 @@ PaddleClas provides an example of how to deploy on mobile by Paddle-Lite. Please
 Paddle2ONNX support convert Paddle Inference model to ONNX model. And you can deploy with ONNX model on different inference engine, such as TensorRT, OpenVINO, MNN/TNN, NCNN and so on. About Paddle2ONNX details, please refer to [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX).
 
 PaddleClas provides an example of how to convert Paddle Inference model to ONNX model by paddle2onnx toolkit and predict by ONNX model. You can refer to [paddle2onnx](../../../deploy/paddle2onnx/readme_en.md) for deployment details.
-
diff --git a/docs/zh_CN/models/PP-ShiTu/README.md b/docs/zh_CN/models/PP-ShiTu/README.md
index a031e8ef07..01b64a1b71 100644
--- a/docs/zh_CN/models/PP-ShiTu/README.md
+++ b/docs/zh_CN/models/PP-ShiTu/README.md
@@ -167,7 +167,7 @@ cd deploy/models
 wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar && tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
 
 # 下载特征提取inference模型并解压
-wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar && tar -xf general_PPLCNetV2_base_pretrained_v1.
+wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar && tar -xf general_PPLCNetV2_base_pretrained_v1.tar
 ```
 
 ### 4.2 测试数据准备

From 6d8ff9342afdace70c1a9e7567b3b04f2d92c583 Mon Sep 17 00:00:00 2001
From: gaotingquan <gaotingquan@baidu.com>
Date: Tue, 1 Nov 2022 13:32:58 +0000
Subject: [PATCH 9/9] docs: update demo result

---
 docs/en/models/PP-HGNet_en.md | 9 ++++++++-
 docs/en/models/PP-LCNet_en.md | 9 ++++++++-
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/docs/en/models/PP-HGNet_en.md b/docs/en/models/PP-HGNet_en.md
index 3c664351ec..0184e15855 100644
--- a/docs/en/models/PP-HGNet_en.md
+++ b/docs/en/models/PP-HGNet_en.md
@@ -148,11 +148,18 @@ Predict complete!
 ```python
 from paddleclas import PaddleClas
 clas = PaddleClas(model_name='PPHGNet_small')
-infer_imgs = 'docs/images/deployment/whl_demo.jpg'
+infer_imgs = 'docs/images/inference_deployment/whl_demo.jpg'
 result = clas.predict(infer_imgs)
 print(next(result))
 ```
 
+The result of demo above:
+
+```
+>>> result
+[{'class_ids': [8, 7, 86, 82, 81], 'scores': [0.77132, 0.05122, 0.00755, 0.00199, 0.00115], 'label_names': ['hen', 'cock', 'partridge', 'ruffed grouse, partridge, Bonasa umbellus', 'ptarmigan'], 'filename': 'docs/images/inference_deployment/whl_demo.jpg'}]
+```
+
 **Note**: The result returned by model.predict() is a `generator`, so you need to use the `next()` function to call it or `for loop` to loop it. And it will predict with batch_size size batch and return the prediction results when called. The default batch_size is 1, and you also specify the batch_size when instantiating, such as `model = paddleclas.PaddleClas(model_name="PPHGNet_small", batch_size=2)`.
 
 <a name="3"></a>
diff --git a/docs/en/models/PP-LCNet_en.md b/docs/en/models/PP-LCNet_en.md
index 652af892b5..d5786996d8 100644
--- a/docs/en/models/PP-LCNet_en.md
+++ b/docs/en/models/PP-LCNet_en.md
@@ -198,11 +198,18 @@ Predict complete!
 ```python
 from paddleclas import PaddleClas
 clas = PaddleClas(model_name='PPLCNet_x1_0')
-infer_imgs = 'docs/images/deployment/whl_demo.jpg'
+infer_imgs = 'docs/images/inference_deployment/whl_demo.jpg'
 result = clas.predict(infer_imgs)
 print(next(result))
 ```
 
+The result of demo above:
+
+```
+>>> result
+[{'class_ids': [8, 7, 86, 81, 85], 'scores': [0.91347, 0.03779, 0.0036, 0.00117, 0.00112], 'label_names': ['hen', 'cock', 'partridge', 'ptarmigan', 'quail'], 'filename': 'docs/images/inference_deployment/whl_demo.jpg'}]
+```
+
 **Note**: The result returned by model.predict() is a `generator`, so you need to use the `next()` function to call it or `for loop` to loop it. And it will predict with batch_size size batch and return the prediction results when called. The default batch_size is 1, and you also specify the batch_size when instantiating, such as `model = paddleclas.PaddleClas(model_name="PPLCNet_x1_0", batch_size=2)`.
 
 <a name="3"></a>