Skip to content

Commit 739400f

Browse files
authored
add slice op demo for quickstart (#12439)
1 parent c364821 commit 739400f

File tree

3 files changed

+101
-0
lines changed

3 files changed

+101
-0
lines changed

doc/doc_ch/quickstart.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -253,6 +253,46 @@ for idx in range(len(result)):
253253
im_show.save('result_page_{}.jpg'.format(idx))
254254
```
255255

256+
* 使用滑动窗口进行检测和识别
257+
258+
要使用滑动窗口进行光学字符识别(OCR),可以使用以下代码片段:
259+
260+
```Python
261+
from paddleocr import PaddleOCR
262+
from PIL import Image, ImageDraw, ImageFont
263+
264+
# 初始化OCR引擎
265+
ocr = PaddleOCR(use_angle_cls=True, lang="en")
266+
267+
img_path = "./very_large_image.jpg"
268+
slice = {'horizontal_stride': 300, 'vertical_stride': 500, 'merge_x_thres': 50, 'merge_y_thres': 35}
269+
results = ocr.ocr(img_path, cls=True, slice=slice)
270+
271+
# 加载图像
272+
image = Image.open(img_path).convert("RGB")
273+
draw = ImageDraw.Draw(image)
274+
font = ImageFont.truetype("./doc/fonts/simfang.ttf", size=20) # 根据需要调整大小
275+
276+
# 处理并绘制结果
277+
for res in results:
278+
for line in res:
279+
box = [tuple(point) for point in line[0]] # 将列表转换为元组列表
280+
# 将四个角转换为两个角
281+
box = [(min(point[0] for point in box), min(point[1] for point in box)),
282+
(max(point[0] for point in box), max(point[1] for point in box))]
283+
txt = line[1][0]
284+
draw.rectangle(box, outline="red", width=2) # 绘制矩形
285+
draw.text((box[0][0], box[0][1] - 25), txt, fill="blue", font=font) # 在矩形上方绘制文本
286+
287+
# 保存结果
288+
image.save("result.jpg")
289+
290+
```
291+
292+
此示例初始化了启用角度分类的PaddleOCR实例,并将语言设置为英语。然后调用`ocr`方法,并使用多个参数来自定义检测和识别过程,包括处理图像切片的`slice`参数。
293+
294+
要更全面地了解切片操作,请参考[切片操作文档](./slice.md)
295+
256296
## 3. 小结
257297

258298
通过本节内容,相信您已经熟练掌握PaddleOCR whl包的使用方法并获得了初步效果。

doc/doc_ch/slice.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# 切片操作
2+
3+
如果希望运行 PaddleOCR 处理一张非常大的图像或文档,对其进行检测和识别,可以使用切片操作,如下所示:
4+
5+
```python
6+
ocr_inst = PaddleOCR(**ocr_settings)
7+
results = ocr_inst.ocr(img, det=True, rec=True, slice=slice, cls=False, bin=False, inv=False, alpha_color=False)
8+
```
9+
10+
其中,
11+
`slice = {'horizontal_stride': h_stride, 'vertical_stride': v_stride, 'merge_x_thres': x_thres, 'merge_y_thres': y_thres}`
12+
13+
这里的 `h_stride``v_stride``x_thres``y_thres` 是用户可配置的参数,需要手动设置。切片操作符的工作原理是,在大图像上运行一个滑动窗口,创建图像的切片,并在这些切片上运行 OCR 算法。
14+
15+
然后将这些切片级别的零散结果合并,生成图像级别的检测和识别结果。水平和垂直步幅不能低于一定限度,因为过低的值会产生太多切片,导致计算结果非常耗时。例如,对于尺寸为 6616x14886 的图像,推荐使用以下参数:
16+
17+
```python
18+
slice = {'horizontal_stride': 300, 'vertical_stride': 500, 'merge_x_thres': 50, 'merge_y_thres': 35}
19+
```
20+
21+
所有边界框接近 `merge_x_thres``merge_y_thres` 的切片级检测结果将被合并在一起。

doc/doc_en/quickstart_en.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -266,6 +266,46 @@ for idx in range(len(result)):
266266
im_show.save('result_page_{}.jpg'.format(idx))
267267
```
268268

269+
* Detection and Recognition Using Sliding Windows
270+
271+
To perform OCR using sliding windows, the following code snippet can be employed:
272+
273+
```Python
274+
from paddleocr import PaddleOCR
275+
from PIL import Image, ImageDraw, ImageFont
276+
277+
# Initialize OCR engine
278+
ocr = PaddleOCR(use_angle_cls=True, lang="en")
279+
280+
img_path = "./very_large_image.jpg"
281+
slice = {'horizontal_stride': 300, 'vertical_stride': 500, 'merge_x_thres': 50, 'merge_y_thres': 35}
282+
results = ocr.ocr(img_path, cls=True, slice=slice)
283+
284+
# Load image
285+
image = Image.open(img_path).convert("RGB")
286+
draw = ImageDraw.Draw(image)
287+
font = ImageFont.truetype("./doc/fonts/simfang.ttf", size=20) # Adjust size as needed
288+
289+
# Process and draw results
290+
for res in results:
291+
for line in res:
292+
box = [tuple(point) for point in line[0]] # Convert list of lists to list of tuples
293+
# Convert four corners to two corners
294+
box = [(min(point[0] for point in box), min(point[1] for point in box)),
295+
(max(point[0] for point in box), max(point[1] for point in box))]
296+
txt = line[1][0]
297+
draw.rectangle(box, outline="red", width=2) # Draw rectangle
298+
draw.text((box[0][0], box[0][1] - 25), txt, fill="blue", font=font) # Draw text above the box
299+
300+
# Save result
301+
image.save("result.jpg")
302+
303+
```
304+
305+
This example initializes the PaddleOCR instance with angle classification enabled and sets the language to English. The `ocr` method is then called with several parameters to customize the detection and recognition process, including the `slice` parameter for handling image slices.
306+
307+
For a more comprehensive understanding of the slicing operation, please refer to the [slice operation documentation](./slice_en.md).
308+
269309
<a name="3"></a>
270310

271311
## 3. Summary

0 commit comments

Comments
 (0)