Skip to content

Commit d889f99

Browse files
authored
Merge branch 'dygraph' into ppocr_v3_doc
2 parents 9b15c7f + 5a08a40 commit d889f99

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

64 files changed

+2769
-366
lines changed

PPOCRLabel/PPOCRLabel.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1733,7 +1733,7 @@ def _saveFile(self, annotationFilePath, mode='Manual'):
17331733
width, height = self.image.width(), self.image.height()
17341734
for shape in self.canvas.lockedShapes:
17351735
box = [[int(p[0] * width), int(p[1] * height)] for p in shape['ratio']]
1736-
assert len(box) == 4
1736+
# assert len(box) == 4
17371737
result = [(shape['transcription'], 1)]
17381738
result.insert(0, box)
17391739
self.result_dic_locked.append(result)

README.md

+7-5
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,8 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
6868

6969
| Model introduction | Model name | Recommended scene | Detection model | Direction classifier | Recognition model |
7070
| ------------------------------------------------------------ | ---------------------------- | ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
71+
| Chinese and English ultra-lightweight PP-OCRv3 model(16.2M) | ch_PP-OCRv3_xx | Mobile & Server | [inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar) |
72+
| English ultra-lightweight PP-OCRv3 model(13.4M) | en_PP-OCRv3_xx | Mobile & Server | [inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_distill_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_train.tar) |
7173
| Chinese and English ultra-lightweight PP-OCRv2 model(11.6M) | ch_PP-OCRv2_xx |Mobile & Server|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar)| [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar)|
7274
| Chinese and English ultra-lightweight PP-OCR model (9.4M) | ch_ppocr_mobile_v2.0_xx | Mobile & server |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_train.tar) |
7375
| Chinese and English general PP-OCR model (143.4M) | ch_ppocr_server_v2.0_xx | Server |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) |
@@ -101,7 +103,7 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
101103
- [PP-Structure 🔥](./ppstructure/README.md)
102104
- [Quick Start](./ppstructure/docs/quickstart_en.md)
103105
- [Model Zoo](./ppstructure/docs/models_list_en.md)
104-
- [Model training](./doc/doc_en/training_en.md)
106+
- [Model training](./doc/doc_en/training_en.md)
105107
- [Layout Parser](./ppstructure/layout/README.md)
106108
- [Table Recognition](./ppstructure/table/README.md)
107109
- [DocVQA](./ppstructure/vqa/README.md)
@@ -121,9 +123,9 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
121123
- [Other Data Annotation Tools](./doc/doc_en/data_annotation_en.md)
122124
- [Other Data Synthesis Tools](./doc/doc_en/data_synthesis_en.md)
123125
- Datasets
124-
- [General OCR Datasets(Chinese/English)](./doc/doc_en/datasets_en.md)
125-
- [HandWritten_OCR_Datasets(Chinese)](./doc/doc_en/handwritten_datasets_en.md)
126-
- [Various OCR Datasets(multilingual)](./doc/doc_en/vertical_and_multilingual_datasets_en.md)
126+
- [General OCR Datasets(Chinese/English)](doc/doc_en/dataset/datasets_en.md)
127+
- [HandWritten_OCR_Datasets(Chinese)](doc/doc_en/dataset/handwritten_datasets_en.md)
128+
- [Various OCR Datasets(multilingual)](doc/doc_en/dataset/vertical_and_multilingual_datasets_en.md)
127129
- [Code Structure](./doc/doc_en/tree_en.md)
128130
- [Visualization](#Visualization)
129131
- [Community](#Community)
@@ -170,4 +172,4 @@ More details, please refer to [Multilingual OCR Development Plan](https://github
170172

171173
<a name="LICENSE"></a>
172174
## License
173-
This project is released under <a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a>
175+
This project is released under <a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a>

README_ch.md

+14-12
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,8 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
7171

7272
| 模型简介 | 模型名称 | 推荐场景 | 检测模型 | 方向分类器 | 识别模型 |
7373
| ------------------------------------- | ----------------------- | --------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
74+
| 中英文超轻量PP-OCRv3模型(16.2M) | ch_PP-OCRv3_xx | 移动端&服务器端 | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar) |
75+
| 英文超轻量PP-OCRv3模型(13.4M) | en_PP-OCRv3_xx | 移动端&服务器端 | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_distill_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_train.tar) |
7476
| 中英文超轻量PP-OCRv2模型(13.0M) | ch_PP-OCRv2_xx | 移动端&服务器端 | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar) |
7577
| 中英文超轻量PP-OCR mobile模型(9.4M) | ch_ppocr_mobile_v2.0_xx | 移动端&服务器端 | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) |
7678
| 中英文通用PP-OCR server模型(143.4M) | ch_ppocr_server_v2.0_xx | 服务器端 | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) | [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) |
@@ -128,12 +130,12 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
128130
- [其它数据标注工具](./doc/doc_ch/data_annotation.md)
129131
- [其它数据合成工具](./doc/doc_ch/data_synthesis.md)
130132
- 数据集
131-
- [通用中英文OCR数据集](./doc/doc_ch/datasets.md)
132-
- [手写中文OCR数据集](./doc/doc_ch/handwritten_datasets.md)
133-
- [垂类多语言OCR数据集](./doc/doc_ch/vertical_and_multilingual_datasets.md)
134-
- [版面分析数据集](./doc/doc_ch/layout_datasets.md)
135-
- [表格识别数据集](./doc/doc_ch/table_datasets.md)
136-
- [DocVQA数据集](./doc/doc_ch/docvqa_datasets.md)
133+
- [通用中英文OCR数据集](doc/doc_ch/dataset/datasets.md)
134+
- [手写中文OCR数据集](doc/doc_ch/dataset/handwritten_datasets.md)
135+
- [垂类多语言OCR数据集](doc/doc_ch/dataset/vertical_and_multilingual_datasets.md)
136+
- [版面分析数据集](doc/doc_ch/dataset/layout_datasets.md)
137+
- [表格识别数据集](doc/doc_ch/dataset/table_datasets.md)
138+
- [DocVQA数据集](doc/doc_ch/dataset/docvqa_datasets.md)
137139
- [代码组织结构](./doc/doc_ch/tree.md)
138140
- [效果展示](#效果展示)
139141
- [《动手学OCR》电子书📚](./doc/doc_ch/ocr_book.md)
@@ -160,13 +162,13 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
160162
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/00056221.jpg" width="800">
161163
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/rotate_00052204.jpg" width="800">
162164
</div>
163-
165+
164166
</details>
165167

166168

167169
<details open>
168170
<summary>PP-OCRv2 英文模型</summary>
169-
171+
170172
<div align="center">
171173
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/img_12.jpg" width="800">
172174
</div>
@@ -176,12 +178,12 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
176178

177179
<details open>
178180
<summary>PP-OCRv2 其他语言模型</summary>
179-
181+
180182
<div align="center">
181183
<img src="./doc/imgs_results/french_0.jpg" width="800">
182184
<img src="./doc/imgs_results/korean.jpg" width="800">
183185
</div>
184-
186+
185187
</details>
186188

187189
<details open>
@@ -196,8 +198,8 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
196198
<div align="center">
197199
<img src="./ppstructure/docs/vqa/result_ser/zh_val_0_ser.jpg" width="800">
198200
</div>
199-
200-
- RE(关系提取)
201+
202+
- RE(关系提取)
201203
<div align="center">
202204
<img src="./ppstructure/docs/vqa/result_re/zh_val_21_re.jpg" width="800">
203205
</div>

applications/多模态表单识别.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
<center><img src='https://ai-studio-static-online.cdn.bcebos.com/9bd844b970f94e5ba0bc0c5799bd819ea9b1861bb306471fabc2d628864d418e'></center>
1717
<center>图1 多模态表单识别流程图</center>
1818

19-
注:欢迎再AIStudio领取免费算力体验线上实训,项目链接: 多模态表单识别](https://aistudio.baidu.com/aistudio/projectdetail/3815918)(配备Tesla V100、A100等高级算力资源)
19+
注:欢迎再AIStudio领取免费算力体验线上实训,项目链接: [多模态表单识别](https://aistudio.baidu.com/aistudio/projectdetail/3815918)(配备Tesla V100、A100等高级算力资源)
2020

2121

2222

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,234 @@
1+
Global:
2+
debug: false
3+
use_gpu: true
4+
epoch_num: 500
5+
log_smooth_window: 20
6+
print_batch_step: 10
7+
save_model_dir: ./output/ch_PP-OCR_v3_det/
8+
save_epoch_step: 100
9+
eval_batch_step:
10+
- 0
11+
- 400
12+
cal_metric_during_train: false
13+
pretrained_model: null
14+
checkpoints: null
15+
save_inference_dir: null
16+
use_visualdl: false
17+
infer_img: doc/imgs_en/img_10.jpg
18+
save_res_path: ./checkpoints/det_db/predicts_db.txt
19+
distributed: true
20+
21+
Architecture:
22+
name: DistillationModel
23+
algorithm: Distillation
24+
model_type: det
25+
Models:
26+
Student:
27+
model_type: det
28+
algorithm: DB
29+
Transform: null
30+
Backbone:
31+
name: MobileNetV3
32+
scale: 0.5
33+
model_name: large
34+
disable_se: true
35+
Neck:
36+
name: RSEFPN
37+
out_channels: 96
38+
shortcut: True
39+
Head:
40+
name: DBHead
41+
k: 50
42+
Student2:
43+
model_type: det
44+
algorithm: DB
45+
Transform: null
46+
Backbone:
47+
name: MobileNetV3
48+
scale: 0.5
49+
model_name: large
50+
disable_se: true
51+
Neck:
52+
name: RSEFPN
53+
out_channels: 96
54+
shortcut: True
55+
Head:
56+
name: DBHead
57+
k: 50
58+
Teacher:
59+
freeze_params: true
60+
return_all_feats: false
61+
model_type: det
62+
algorithm: DB
63+
Backbone:
64+
name: ResNet
65+
in_channels: 3
66+
layers: 50
67+
Neck:
68+
name: LKPAN
69+
out_channels: 256
70+
Head:
71+
name: DBHead
72+
kernel_list: [7,2,2]
73+
k: 50
74+
75+
Loss:
76+
name: CombinedLoss
77+
loss_config_list:
78+
- DistillationDilaDBLoss:
79+
weight: 1.0
80+
model_name_pairs:
81+
- ["Student", "Teacher"]
82+
- ["Student2", "Teacher"]
83+
key: maps
84+
balance_loss: true
85+
main_loss_type: DiceLoss
86+
alpha: 5
87+
beta: 10
88+
ohem_ratio: 3
89+
- DistillationDMLLoss:
90+
model_name_pairs:
91+
- ["Student", "Student2"]
92+
maps_name: "thrink_maps"
93+
weight: 1.0
94+
# act: None
95+
model_name_pairs: ["Student", "Student2"]
96+
key: maps
97+
- DistillationDBLoss:
98+
weight: 1.0
99+
model_name_list: ["Student", "Student2"]
100+
# key: maps
101+
# name: DBLoss
102+
balance_loss: true
103+
main_loss_type: DiceLoss
104+
alpha: 5
105+
beta: 10
106+
ohem_ratio: 3
107+
108+
Optimizer:
109+
name: Adam
110+
beta1: 0.9
111+
beta2: 0.999
112+
lr:
113+
name: Cosine
114+
learning_rate: 0.001
115+
warmup_epoch: 2
116+
regularizer:
117+
name: L2
118+
factor: 5.0e-05
119+
120+
PostProcess:
121+
name: DistillationDBPostProcess
122+
model_name: ["Student"]
123+
key: head_out
124+
thresh: 0.3
125+
box_thresh: 0.6
126+
max_candidates: 1000
127+
unclip_ratio: 1.5
128+
129+
Metric:
130+
name: DistillationMetric
131+
base_metric_name: DetMetric
132+
main_indicator: hmean
133+
key: "Student"
134+
135+
Train:
136+
dataset:
137+
name: SimpleDataSet
138+
data_dir: ./train_data/icdar2015/text_localization/
139+
label_file_list:
140+
- ./train_data/icdar2015/text_localization/train_icdar2015_label.txt
141+
ratio_list: [1.0]
142+
transforms:
143+
- DecodeImage:
144+
img_mode: BGR
145+
channel_first: false
146+
- DetLabelEncode: null
147+
- CopyPaste:
148+
- IaaAugment:
149+
augmenter_args:
150+
- type: Fliplr
151+
args:
152+
p: 0.5
153+
- type: Affine
154+
args:
155+
rotate:
156+
- -10
157+
- 10
158+
- type: Resize
159+
args:
160+
size:
161+
- 0.5
162+
- 3
163+
- EastRandomCropData:
164+
size:
165+
- 960
166+
- 960
167+
max_tries: 50
168+
keep_ratio: true
169+
- MakeBorderMap:
170+
shrink_ratio: 0.4
171+
thresh_min: 0.3
172+
thresh_max: 0.7
173+
- MakeShrinkMap:
174+
shrink_ratio: 0.4
175+
min_text_size: 8
176+
- NormalizeImage:
177+
scale: 1./255.
178+
mean:
179+
- 0.485
180+
- 0.456
181+
- 0.406
182+
std:
183+
- 0.229
184+
- 0.224
185+
- 0.225
186+
order: hwc
187+
- ToCHWImage: null
188+
- KeepKeys:
189+
keep_keys:
190+
- image
191+
- threshold_map
192+
- threshold_mask
193+
- shrink_map
194+
- shrink_mask
195+
loader:
196+
shuffle: true
197+
drop_last: false
198+
batch_size_per_card: 8
199+
num_workers: 4
200+
Eval:
201+
dataset:
202+
name: SimpleDataSet
203+
data_dir: ./train_data/icdar2015/text_localization/
204+
label_file_list:
205+
- ./train_data/icdar2015/text_localization/test_icdar2015_label.txt
206+
transforms:
207+
- DecodeImage:
208+
img_mode: BGR
209+
channel_first: false
210+
- DetLabelEncode: null
211+
- DetResizeForTest: null
212+
- NormalizeImage:
213+
scale: 1./255.
214+
mean:
215+
- 0.485
216+
- 0.456
217+
- 0.406
218+
std:
219+
- 0.229
220+
- 0.224
221+
- 0.225
222+
order: hwc
223+
- ToCHWImage: null
224+
- KeepKeys:
225+
keep_keys:
226+
- image
227+
- shape
228+
- polys
229+
- ignore_tags
230+
loader:
231+
shuffle: false
232+
drop_last: false
233+
batch_size_per_card: 1
234+
num_workers: 2

0 commit comments

Comments
 (0)