Skip to content

docs: fix typos #14816

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 6, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/ppocr/model_train/detection.en.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml \
In the above instruction, use `-c` to select the training to use the `configs/det/det_mv3_db.yml` configuration file.
For a detailed explanation of the configuration file, please refer to [config](../blog/config.en.md).

You can also use `-o` to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001
You can also use `-o` to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001.

```bash linenums="1"
# single GPU training
Expand Down Expand Up @@ -244,5 +244,5 @@ Q1: The prediction results of trained model and inference model are inconsistent

**A**: Most of the problems are caused by the inconsistency of the pre-processing and post-processing parameters during the prediction of the trained model and the pre-processing and post-processing parameters during the prediction of the inference model. Taking the model trained by the det_mv3_db.yml configuration file as an example, the solution to the problem of inconsistent prediction results between the training model and the inference model is as follows:

- Check whether the [trained model preprocessing](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/configs/det/det_mv3_db.yml#L116) is consistent with the prediction [preprocessing function of the inference model](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/tools/infer/predict_det.py#L42). When the algorithm is evaluated, the input image size will affect the accuracy. In order to be consistent with the paper, the image is resized to [736, 1280] in the training icdar15 configuration file, but there is only a set of default parameters when the inference model predicts, which will be considered To predict the speed problem, the longest side of the image is limited to 960 for resize by default. The preprocessing function of the training model preprocessing and the inference model is located in [ppocr/data/imaug/operators.py](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/ppocr/data/imaug/operators.py#L147)
- Check whether the [trained model preprocessing](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/configs/det/det_mv3_db.yml#L116) is consistent with the prediction [preprocessing function of the inference model](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/tools/infer/predict_det.py#L42). When the algorithm is evaluated, the input image size will affect the accuracy. In order to be consistent with the paper, the image is resized to [736, 1280] in the training icdar15 configuration file, but there is only a set of default parameters when the inference model predicts, which will be considered. To predict the speed problem, the longest side of the image is limited to 960 for resize by default. The preprocessing function of the training model preprocessing and the inference model is located in [ppocr/data/imaug/operators.py](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/ppocr/data/imaug/operators.py#L147).
- Check whether the [post-processing of the trained model](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/configs/det/det_mv3_db.yml#L51) is consistent with the [post-processing parameters of the inference](https://github.com/PaddlePaddle/PaddleOCR/blob/c1ed243fb68d5d466258243092e56cbae32e2c14/tools/infer/utility.py#L50).
10 changes: 5 additions & 5 deletions docs/ppocr/model_train/finetune.en.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@ The core points of this article are as follows:
2. Adding a small amount of real data (detection:>=500, recognition:>=5000) will greatly improve the detection and recognition effect of vertical scenes
3. When fine-tuning the model, adding real general scene data can further improve the model accuracy and generalization performance
4. In the text detection task, increasing the prediction shape of the image can further improve the detection effect of the smaller text area
5. When fine-tuning the model, it is necessary to properly adjust the hyperparameters (learning rate, batch size are the most important) to obtain a better fine-tuning effect.
5. When fine-tuning the model, it is necessary to properly adjust the hyperparameters (learning rate and batch size are the most important) to obtain a better fine-tuning effect.

For more details, please refer to Chapter 2 and Chapter 3
For more details, please refer to Chapter 2 and Chapter 3.

## 2. Text detection model fine-tuning

Expand All @@ -32,13 +32,13 @@ For more details, please refer to Chapter 2 and Chapter 3。

It is recommended to choose the PP-OCRv3 model (configuration file: [ch_PP-OCRv3_det_student.yml](https://github.com/PaddlePaddle/PaddleOCR/tree/main/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml),pre-trained model: [ch_PP-OCRv3_det_distill_train.tar](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar), its accuracy and generalization performance is the best pre-training model currently available.

For more PP-OCR series models, please refer to [PP-OCR Series Model Library](../models_list.en.md)
For more PP-OCR series models, please refer to [PP-OCR Series Model Library](../models_list.en.md).

Note: When using the above pre-trained model, you need to use the `student.pdparams` file in the folder as the pre-trained model, that is, only use the student model.

### 2.3 Training hyperparameter

When fine-tuning the model, the most important hyperparameter is the pre-training model path `pretrained_model`, `learning_rate``batch_size`,some hyperparameters are as follows:
When fine-tuning the model, the most important hyperparameter is the pre-training model path `pretrained_model`, `learning_rate` and `batch_size`,some hyperparameters are as follows:

```yaml linenums="1"
Global:
Expand Down Expand Up @@ -80,7 +80,7 @@ When exporting and inferring the trained model, you can further adjust the predi
| use_dilation | bool | False | Whether to expand the segmentation results to obtain better detection results |
| det_db_score_mode | str | "fast" | DB's detection result score calculation method supports `fast` and `slow`. `fast` calculates the average score based on all pixels in the polygon’s circumscribed rectangle border, and `slow` calculates the average score based on all pixels in the original polygon. The calculation speed is relatively slower, but more accurate. |

For more information on inference methods, please refer to[Paddle Inference doc](../infer_deploy/python_infer.en.md)
For more information on inference methods, please refer to[Paddle Inference doc](../infer_deploy/python_infer.en.md).

## 3. Text recognition model fine-tuning

Expand Down
6 changes: 3 additions & 3 deletions docs/ppocr/model_train/training.en.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ At the same time, it will briefly introduce the structure of the training data a

The PaddleOCR uses configuration files to control network training and evaluation parameters. In the configuration file, you can set the model, optimizer, loss function, and pre- and post-processing parameters of the model. PaddleOCR reads these parameters from the configuration file, and then builds a complete training process to train the model. Fine-tuning can also be completed by modifying the parameters in the configuration file, which is simple and convenient.

For the complete configuration file description, please refer to [Configuration File](../blog/config.en.md)
For the complete configuration file description, please refer to [Configuration File](../blog/config.en.md).

## 2. Basic Concepts

Expand All @@ -21,7 +21,7 @@ During the model training process, some hyper-parameters can be manually specifi
### 2.1 Learning Rate

The learning rate is one of the most important hyper-parameters for training neural networks. It represents the step length of the gradient moving towards the optimal solution of the loss function in each iteration.
A variety of learning rate update strategies are provided by PaddleOCR, which can be specified in configuration files. For example,
A variety of learning rate update strategies are provided by PaddleOCR, which can be specified in configuration files. For example:

```yaml linenums="1"
Optimizer:
Expand Down Expand Up @@ -59,7 +59,7 @@ Optimizer:

(2) Recognition stage: Character recognition accuracy, that is, the ratio of correctly recognized text lines to the number of marked text lines. Only the entire line of text recognition pairs can be regarded as correct recognition.

(3) End-to-end statistics: End-to-end recall rate: accurately detect and correctly identify the proportion of text lines in all labeled text lines; End-to-end accuracy rate: accurately detect and correctly identify the number of text lines in the detected text lines The standard for accurate detection is that the IOU of the detection box and the labeled box is greater than a certain threshold, and the text in the correctly identified detection box is the same as the labeled text.
(3) End-to-end statistics: End-to-end recall rate: accurately detect and correctly identify the proportion of text lines in all labeled text lines; End-to-end accuracy rate: accurately detect and correctly identify the number of text lines in the detected text lines. The standard for accurate detection is that the IOU of the detection box and the labeled box is greater than a certain threshold, and the text in the correctly identified detection box is the same as the labeled text.

## 3. Data and Vertical Scenes

Expand Down