Skip to content

Commit 0157a77

Browse files
authored
* fix docs * update
1 parent 458afe6 commit 0157a77

File tree

8 files changed

+124
-157
lines changed

8 files changed

+124
-157
lines changed

README.md

Lines changed: 28 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,16 @@
44
</p>
55

66
<!-- language -->
7-
中文 | [English](./README_en.md) | [日本語](./README_ja.md)
7+
中文 | [English](./README_en.md)
88

99
<!-- icon -->
1010

1111
[![stars](https://img.shields.io/github/stars/PaddlePaddle/PaddleOCR?color=ccf)](https://github.com/PaddlePaddle/PaddleOCR)
1212
[![Downloads](https://img.shields.io/pypi/dm/paddleocr)](https://pypi.org/project/PaddleOCR/)
13-
![python](https://img.shields.io/badge/python-3.8+-aff.svg)
13+
![python](https://img.shields.io/badge/python-3.8~3.12-aff.svg)
1414
![os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg)
15+
![hardware](https://img.shields.io/badge/hardware-cpu%2C%20gpu%2C%20xpu%2C%20npu-yellow.svg)
16+
1517

1618
[![Website](https://img.shields.io/badge/Website-PaddleOCR-blue?logo=)](https://www.paddleocr.ai/)
1719
[![AI Studio](https://img.shields.io/badge/PP_OCRv5-AI_Studio-green)](https://aistudio.baidu.com/community/app/91660/webUI)
@@ -24,9 +26,9 @@
2426
PaddleOCR自发布以来凭借学术前沿算法和产业落地实践,受到了产学研各方的喜爱,并被广泛应用于众多知名开源项目,例如:Umi-OCR、OmniParser、MinerU、RAGFlow等,已成为广大开发者心中的开源OCR领域的首选工具。2025年5月20日,飞桨团队发布**PaddleOCR 3.0**,全面适配**飞桨框架3.0正式版**,进一步**提升文字识别精度**,支持**多文字类型识别****手写体识别**,满足大模型应用对**复杂文档高精度解析**的旺盛需求,结合**文心大模型4.5 Turbo**显著提升关键信息抽取精度,并新增**对昆仑芯、昇腾等国产硬件**的支持。
2527

2628
PaddleOCR 3.0**新增**三大特色能力:
27-
- 全场景文字识别模型[PP-OCRv5](docs/version3.x/algorithm/PP-OCRv5/PP-OCRv5.md):单模型支持五种文字类型和复杂手写体识别;整体识别精度相比上一代**提升13个百分点**
28-
- 通用文档解析方案[PP-StructureV3](docs/version3.x/algorithm/PP-StructureV3/PP-StructureV3.md):支持多场景、多版式 PDF 高精度解析,在公开评测集中**领先众多开源和闭源方案**
29-
- 智能文档理解方案[PP-ChatOCRv4](docs/version3.x/algorithm/PP-ChatOCRv4/PP-ChatOCRv4.md):原生支持文心大模型4.5 Turbo,精度相比上一代**提升15个百分点**
29+
- 全场景文字识别模型[PP-OCRv5](docs/version3.x/algorithm/PP-OCRv5/PP-OCRv5.md):单模型支持五种文字类型和复杂手写体识别;整体识别精度相比上一代**提升13个百分点**[在线体验](https://aistudio.baidu.com/community/app/91660/webUI)
30+
- 通用文档解析方案[PP-StructureV3](docs/version3.x/algorithm/PP-StructureV3/PP-StructureV3.md):支持多场景、多版式 PDF 高精度解析,在公开评测集中**领先众多开源和闭源方案**[在线体验](https://aistudio.baidu.com/community/app/518494/webUI)
31+
- 智能文档理解方案[PP-ChatOCRv4](docs/version3.x/algorithm/PP-ChatOCRv4/PP-ChatOCRv4.md):原生支持文心大模型4.5 Turbo,精度相比上一代**提升15个百分点**[在线体验](https://aistudio.baidu.com/community/app/518493/webUI)
3032

3133
PaddleOCR 3.0除了提供优秀的模型库外,还提供好学易用的工具,覆盖模型训练、推理和服务化部署,方便开发者快速落地AI应用。
3234
<div align="center">
@@ -68,18 +70,18 @@ PaddleOCR 3.0除了提供优秀的模型库外,还提供好学易用的工具
6870

6971
```bash
7072
# 安装 paddleocr
71-
pip install paddleocr
73+
pip install paddleocr==3.0.0
7274
```
7375

7476
### 3. 命令行方式推理
7577
```bash
7678
# 运行 PP-OCRv5 推理
77-
paddleocr ocr -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png --use_doc_orientation_classify False --use_doc_unwarping False
79+
paddleocr ocr -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png --use_doc_orientation_classify False --use_doc_unwarping False --use_textline_orientation False
7880

7981
# 运行 PP-StructureV3 推理
80-
paddleocr pp_structurev3 -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png --use_doc_orientation_classify False --use_doc_unwarping False
82+
paddleocr pp_structurev3 -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png --use_doc_orientation_classify False --use_doc_unwarping False
8183

82-
# 运行 PP-ChatOCRv4 推理前,需要先获得千帆KPI Key
84+
# 运行 PP-ChatOCRv4 推理前,需要先获得千帆API Key
8385
paddleocr pp_chatocrv4_doc -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/vehicle_certificate-1.png -k 驾驶室准乘人数 --qianfan_api_key your_api_key --use_doc_orientation_classify False --use_doc_unwarping False
8486

8587
# 查看 "paddleocr ocr" 详细参数
@@ -91,9 +93,13 @@ paddleocr ocr --help
9193
```python
9294
from paddleocr import PaddleOCR
9395
# 初始化 PaddleOCR 实例
94-
ocr = PaddleOCR()
96+
ocr = PaddleOCR(
97+
use_doc_orientation_classify=False,
98+
use_doc_unwarping=False,
99+
use_textline_orientation=False)
95100
# 对示例图像执行 OCR 推理
96-
result = ocr.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png")
101+
result = ocr.predict(
102+
input="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png")
97103
# 可视化结果并保存 json 结果
98104
for res in result:
99105
res.print()
@@ -108,45 +114,21 @@ for res in result:
108114
from pathlib import Path
109115
from paddleocr import PPStructureV3
110116

111-
pipeline = PPStructureV3()
117+
pipeline = PPStructureV3(
118+
use_doc_orientation_classify=False,
119+
use_doc_unwarping=False
120+
)
112121

113122
# For Image
114-
output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png")
123+
output = pipeline.predict(
124+
input="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png",
125+
)
115126

116127
# 可视化结果并保存 json 结果
117128
for res in output:
118129
res.print()
119130
res.save_to_json(save_path="output")
120131
res.save_to_markdown(save_path="output")
121-
122-
# For PDF File
123-
input_file = "./your_pdf_file.pdf"
124-
output_path = Path("./output")
125-
126-
output = pipeline.predict(input_file)
127-
128-
markdown_list = []
129-
markdown_images = []
130-
131-
for res in output:
132-
md_info = res.markdown
133-
markdown_list.append(md_info)
134-
markdown_images.append(md_info.get("markdown_images", {}))
135-
136-
markdown_texts = pipeline.concatenate_markdown_pages(markdown_list)
137-
138-
mkd_file_path = output_path / f"{Path(input_file).stem}.md"
139-
mkd_file_path.parent.mkdir(parents=True, exist_ok=True)
140-
141-
with open(mkd_file_path, "w", encoding="utf-8") as f:
142-
f.write(markdown_texts)
143-
144-
for item in markdown_images:
145-
if item:
146-
for path, image in item.items():
147-
file_path = output_path / path
148-
file_path.parent.mkdir(parents=True, exist_ok=True)
149-
image.save(file_path)
150132
```
151133

152134
</details>
@@ -182,12 +164,13 @@ mllm_chat_bot_config = {
182164
"api_key": "api_key", # your api_key
183165
}
184166

185-
pipeline = PPChatOCRv4Doc()
167+
pipeline = PPChatOCRv4Doc(
168+
use_doc_orientation_classify=False,
169+
use_doc_unwarping=False
170+
)
186171

187172
visual_predict_res = pipeline.visual_predict(
188173
input="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/vehicle_certificate-1.png",
189-
use_doc_orientation_classify=False,
190-
use_doc_unwarping=False,
191174
use_common_ocr=True,
192175
use_seal_recognition=True,
193176
use_table_recognition=True,

README_en.md

Lines changed: 24 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,9 @@
1010

1111
[![stars](https://img.shields.io/github/stars/PaddlePaddle/PaddleOCR?color=ccf)](https://github.com/PaddlePaddle/PaddleOCR)
1212
[![Downloads](https://img.shields.io/pypi/dm/paddleocr)](https://pypi.org/project/PaddleOCR/)
13-
![python](https://img.shields.io/badge/python-3.8+-aff.svg)
13+
![python](https://img.shields.io/badge/python-3.8~3.12-aff.svg)
1414
![os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg)
15+
![hardware](https://img.shields.io/badge/hardware-cpu%2C%20gpu%2C%20xpu%2C%20npu-yellow.svg)
1516

1617
[![Website](https://img.shields.io/badge/Website-PaddleOCR-blue?logo=)](https://www.paddleocr.ai/)
1718
[![AI Studio](https://img.shields.io/badge/PP_OCRv5-AI_Studio-green)](https://aistudio.baidu.com/community/app/91660/webUI)
@@ -26,11 +27,11 @@ Since its initial release, PaddleOCR has gained widespread acclaim across academ
2627
On May 20, 2025, the PaddlePaddle team unveiled PaddleOCR 3.0, fully compatible with the official release of the **PaddlePaddle 3.0** framework. This update further **boosts text-recognition accuracy**, adds support for **multiple text-type recognition** and **handwriting recognition**, and meets the growing demand from large-model applications for **high-precision parsing of complex documents**. When combined with the **ERNIE 4.5T**, it significantly enhances key-information extraction accuracy. PaddleOCR 3.0 also introduces support for domestic hardware platforms such as **KUNLUNXIN** and **Ascend**.
2728

2829
Three Major New Features in PaddleOCR 3.0:
29-
- Universal-Scene Text Recognition Model [PP-OCRv5](./docs/version3.x/algorithm/PP-OCRv5/PP-OCRv5.en.md): A single model that handles five different text types plus complex handwriting. Overall recognition accuracy has increased by 13 percentage points over the previous generation.
30+
- Universal-Scene Text Recognition Model [PP-OCRv5](./docs/version3.x/algorithm/PP-OCRv5/PP-OCRv5.en.md): A single model that handles five different text types plus complex handwriting. Overall recognition accuracy has increased by 13 percentage points over the previous generation. [Online Demo](https://aistudio.baidu.com/community/app/91660/webUI)
3031

31-
- General Document-Parsing Solution [PP-StructureV3](./docs/version3.x/algorithm/PP-StructureV3/PP-StructureV3.en.md): Delivers high-precision parsing of multi-layout, multi-scene PDFs, outperforming many open- and closed-source solutions on public benchmarks.
32+
- General Document-Parsing Solution [PP-StructureV3](./docs/version3.x/algorithm/PP-StructureV3/PP-StructureV3.en.md): Delivers high-precision parsing of multi-layout, multi-scene PDFs, outperforming many open- and closed-source solutions on public benchmarks. [Online Demo](https://aistudio.baidu.com/community/app/518494/webUI)
3233

33-
- Intelligent Document-Understanding Solution [PP-ChatOCRv4](./docs/version3.x/algorithm/PP-ChatOCRv4/PP-ChatOCRv4.en.md): Natively powered by the WenXin large model 4.5T, achieving 15 percentage points higher accuracy than its predecessor.
34+
- Intelligent Document-Understanding Solution [PP-ChatOCRv4](./docs/version3.x/algorithm/PP-ChatOCRv4/PP-ChatOCRv4.en.md): Natively powered by the WenXin large model 4.5T, achieving 15 percentage points higher accuracy than its predecessor. [Online Demo](https://aistudio.baidu.com/community/app/518493/webUI)
3435

3536
In addition to providing an outstanding model library, PaddleOCR 3.0 also offers user-friendly tools covering model training, inference, and service deployment, so developers can rapidly bring AI applications to production.
3637
<div align="center">
@@ -86,19 +87,19 @@ Install PaddlePaddle refer to [Installation Guide](https://www.paddlepaddle.org.
8687

8788
```bash
8889
# Install paddleocr
89-
pip install paddleocr
90+
pip install paddleocr==3.0.0
9091
```
9192

9293
### 3. Run inference by CLI
9394
```bash
9495
# Run PP-OCRv5 inference
95-
paddleocr ocr -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png --use_doc_orientation_classify False --use_doc_unwarping False
96+
paddleocr ocr -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png --use_doc_orientation_classify False --use_doc_unwarping False --use_textline_orientation False
9697

9798
# Run PP-StructureV3 inference
9899
paddleocr pp_structurev3 -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png --use_doc_orientation_classify False --use_doc_unwarping False
99100

100101
# Get the Qianfan API Key at first, and then run PP-ChatOCRv4 inference
101-
paddleocr pp_chatocrv4_doc -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/vehicle_certificate-1.png -k 驾驶室准乘人数 --qianfan_api_key your_api_key --use_doc_orientation_classify False --use_doc_unwarping False
102+
paddleocr pp_chatocrv4_doc -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/vehicle_certificate-1.png -k 驾驶室准乘人数 --qianfan_api_key your_api_key --use_doc_orientation_classify False --use_doc_unwarping False
102103

103104
# Get more information about "paddleocr ocr"
104105
paddleocr ocr --help
@@ -107,13 +108,15 @@ paddleocr ocr --help
107108
### 4. Run inference by API
108109
**4.1 PP-OCRv5 Example**
109110
```python
110-
from paddleocr import PaddleOCR
111-
112111
# Initialize PaddleOCR instance
113-
ocr = PaddleOCR()
112+
ocr = PaddleOCR(
113+
use_doc_orientation_classify=False,
114+
use_doc_unwarping=False,
115+
use_textline_orientation=False)
114116

115117
# Run OCR inference on a sample image
116-
result = ocr.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png")
118+
result = ocr.predict(
119+
input="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png")
117120

118121
# Visualize the results and save the JSON results
119122
for res in result:
@@ -132,41 +135,17 @@ from paddleocr import PPStructureV3
132135
pipeline = PPStructureV3()
133136

134137
# For Image
135-
output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png")
138+
output = pipeline.predict(
139+
input="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png",
140+
use_doc_orientation_classify=False,
141+
use_doc_unwarping=False
142+
)
136143

137144
# Visualize the results and save the JSON results
138145
for res in output:
139146
res.print()
140-
res.save_to_json(save_path="output")
141-
res.save_to_markdown(save_path="output")
142-
# For PDF File
143-
input_file = "./your_pdf_file.pdf"
144-
output_path = Path("./output")
145-
146-
output = pipeline.predict(input_file)
147-
148-
markdown_list = []
149-
markdown_images = []
150-
151-
for res in output:
152-
md_info = res.markdown
153-
markdown_list.append(md_info)
154-
markdown_images.append(md_info.get("markdown_images", {}))
155-
156-
markdown_texts = pipeline.concatenate_markdown_pages(markdown_list)
157-
158-
mkd_file_path = output_path / f"{Path(input_file).stem}.md"
159-
mkd_file_path.parent.mkdir(parents=True, exist_ok=True)
160-
161-
with open(mkd_file_path, "w", encoding="utf-8") as f:
162-
f.write(markdown_texts)
163-
164-
for item in markdown_images:
165-
if item:
166-
for path, image in item.items():
167-
file_path = output_path / path
168-
file_path.parent.mkdir(parents=True, exist_ok=True)
169-
image.save(file_path)
147+
res.save_to_json(save_path="output")
148+
res.save_to_markdown(save_path="output")
170149
```
171150

172151
</details>
@@ -201,12 +180,12 @@ mllm_chat_bot_config = {
201180
"api_key": "api_key", # your api_key
202181
}
203182

204-
pipeline = PPChatOCRv4Doc()
183+
pipeline = PPChatOCRv4Doc(
184+
use_doc_orientation_classify=False,
185+
use_doc_unwarping=False)
205186

206187
visual_predict_res = pipeline.visual_predict(
207188
input="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/vehicle_certificate-1.png",
208-
use_doc_orientation_classify=False,
209-
use_doc_unwarping=False,
210189
use_common_ocr=True,
211190
use_seal_recognition=True,
212191
use_table_recognition=True,

docs/index.en.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,16 +13,17 @@ hide:
1313

1414
Since its initial release, PaddleOCR has gained widespread acclaim across academia, industry, and research communities, thanks to its cutting-edge algorithms and proven performance in real-world applications. It’s already powering popular open-source projects like Umi-OCR, OmniParser, MinerU, and RAGFlow, making it the go-to OCR toolkit for developers worldwide.
1515

16-
On May 20, 2025, the PaddlePaddle team unveiled PaddleOCR 3.0, fully compatible with the official release of the [PaddlePaddle 3.0](https://github.com/PaddlePaddle/Paddle) framework. This update further **boosts text-recognition accuracy**, adds support for **multiple text-type recognition** and **handwriting recognition**, and meets the growing demand from large-model applications for **high-precision parsing of complex documents**. When combined with the **ERNIE 4.5T**, it significantly enhances key-information extraction accuracy. PaddleOCR 3.0 also introduces support for domestic hardware platforms such as **KUNLUNXIN** and **Ascend**.
16+
On May 20, 2025, the PaddlePaddle team unveiled PaddleOCR 3.0, fully compatible with the official release of the [PaddlePaddle 3.0](https://github.com/PaddlePaddle/Paddle) framework. This update further **boosts text-recognition accuracy**, adds support for **multiple text-type recognition** and **handwriting recognition**, and meets the growing demand from large-model applications for **high-precision parsing of complex documents**. When combined with the **ERNIE 4.5 Turbo**, it significantly enhances key-information extraction accuracy. PaddleOCR 3.0 also introduces support for domestic hardware platforms such as **KUNLUNXIN** and **Ascend**.
1717

1818

1919
Three Major New Features in PaddleOCR 3.0:
2020

21-
- 🖼️ Universal-Scene Text Recognition Model [PP-OCRv5](version3.x/algorithm/PP-OCRv5/PP-OCRv5.en.md): A single model that handles five different text types plus complex handwriting. Overall recognition accuracy has increased by 13 percentage points over the previous generation.
21+
- 🖼️ Universal-Scene Text Recognition Model [PP-OCRv5](version3.x/algorithm/PP-OCRv5/PP-OCRv5.en.md): A single model that handles five different text types plus complex handwriting. Overall recognition accuracy has increased by 13 percentage points over the previous generation.[Online Demo](https://aistudio.baidu.com/community/app/91660/webUI)
2222

23-
- 🧮 General Document-Parsing Solution [PP-StructureV3](./version3.x/algorithm/PP-StructureV3/PP-StructureV3.en.md): Delivers high-precision parsing of multi-layout, multi-scene PDFs, outperforming many open- and closed-source solutions on public benchmarks.
23+
- 🧮 General Document-Parsing Solution [PP-StructureV3](./version3.x/algorithm/PP-StructureV3/PP-StructureV3.en.md): Delivers high-precision parsing of multi-layout, multi-scene PDFs, outperforming many open- and closed-source solutions on public benchmarks. [Online Demo](https://aistudio.baidu.com/community/app/518494/webUI)
2424

25-
- 📈 Intelligent Document-Understanding Solution [PP-ChatOCRv4](./version3.x/algorithm/PP-ChatOCRv4/PP-ChatOCRv4.en.md): Natively powered by the WenXin large model 4.5T, achieving 15.7 percentage points higher accuracy than its predecessor.
25+
26+
- 📈 Intelligent Document-Understanding Solution [PP-ChatOCRv4](./version3.x/algorithm/PP-ChatOCRv4/PP-ChatOCRv4.en.md): Natively powered by the WenXin large model 4.5T, achieving 15.7 percentage points higher accuracy than its predecessor. [Online Demo](https://aistudio.baidu.com/community/app/518493/webUI)
2627

2728
In addition to providing an outstanding model library, PaddleOCR 3.0 also offers user-friendly tools covering model training, inference, and service deployment, so developers can rapidly bring AI applications to production.
2829
<div align="center">

0 commit comments

Comments
 (0)