Skip to content

Commit f67e703

Browse files
authored
cp add_cls_id_layout_ratio (#3621)
1 parent d8ce81d commit f67e703

File tree

8 files changed

+83
-38
lines changed

8 files changed

+83
-38
lines changed

docs/module_usage/tutorials/ocr_modules/layout_detection.en.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -308,23 +308,25 @@ Relevant methods, parameters, and explanations are as follows:
308308
<tr>
309309
<td><code>layout_unclip_ratio</code></td>
310310
<td>Scaling factor for the side length of the detection box; if not specified, the default PaddleX official model configuration will be used</td>
311-
<td><code>float/list/None</code></td>
311+
<td><code>float/list/dict/None</code></td>
312312
<td>
313313
<ul>
314314
<li><b>float</b>, a positive float number, e.g., 1.1, means expanding the width and height of the detection box by 1.1 times while keeping the center unchanged</li>
315315
<li><b>List</b>, e.g., [1.2, 1.5], means expanding the width by 1.2 times and the height by 1.5 times while keeping the center unchanged</li>
316+
<li><b>dict</b>, keys as <b>int</b> representing <code>cls_id</code>, values as float scaling factors, e.g., <code>{0: (1.1, 2.0)}</code> means cls_id 0 expanding the width by 1.1 times and the height by 2.0 times while keeping the center unchanged</li>
316317
<li><b>None</b>, not specified, will use the default PaddleX official model configuration</li>
317318
</ul>
318319
</td>
319320
<tr>
320321
<td><code>layout_merge_bboxes_mode</code></td>
321322
<td>Merging mode for the detection boxes output by the model; if not specified, the default PaddleX official model configuration will be used</td>
322-
<td><code>string/None</code></td>
323+
<td><code>string/dict/None</code></td>
323324
<td>
324325
<ul>
325326
<li><b>large</b>, when set to large, only the largest external box will be retained for overlapping detection boxes, and the internal overlapping boxes will be deleted</li>
326327
<li><b>small</b>, when set to small, only the smallest internal box will be retained for overlapping detection boxes, and the external overlapping boxes will be deleted</li>
327328
<li><b>union</b>, no filtering of boxes will be performed, and both internal and external boxes will be retained</li>
329+
<li><b>dict</b>, keys as <b>int</b> representing <code>cls_id</code> and values as merging modes, e.g., <code>{0: "large", 2: "small"}</li>
328330
<li><b>None</b>, not specified, will use the default PaddleX official model configuration</li>
329331
</ul>
330332
</td>
@@ -395,23 +397,25 @@ Relevant methods, parameters, and explanations are as follows:
395397
<tr>
396398
<td><code>layout_unclip_ratio</code></td>
397399
<td>Scaling factor for the side length of the detection box; if not specified, the default PaddleX official model configuration will be used</td>
398-
<td><code>float/list/None</code></td>
400+
<td><code>float/list/dict/None</code></td>
399401
<td>
400402
<ul>
401403
<li><b>float</b>, a positive float number, e.g., 1.1, means expanding the width and height of the detection box by 1.1 times while keeping the center unchanged</li>
402404
<li><b>List</b>, e.g., [1.2, 1.5], means expanding the width by 1.2 times and the height by 1.5 times while keeping the center unchanged</li>
405+
<li><b>dict</b>, keys as <b>int</b> representing <code>cls_id</code>, values as float scaling factors, e.g., <code>{0: (1.1, 2.0)}</code> means cls_id 0 expanding the width by 1.1 times and the height by 2.0 times while keeping the center unchanged</li>
403406
<li><b>None</b>, not specified, will use the <code>layout_unclip_ratio</code> parameter specified in <code>create_model</code>. If not specified in <code>create_model</code>, the default PaddleX official model configuration will be used</li>
404407
</ul>
405408
</td>
406409
<tr>
407410
<td><code>layout_merge_bboxes_mode</code></td>
408411
<td>Merging mode for the detection boxes output by the model; if not specified, the default PaddleX official model configuration will be used</td>
409-
<td><code>string/None</code></td>
412+
<td><code>string/dict/None</code></td>
410413
<td>
411414
<ul>
412415
<li><b>large</b>, when set to large, only the largest external box will be retained for overlapping detection boxes, and the internal overlapping boxes will be deleted</li>
413416
<li><b>small</b>, when set to small, only the smallest internal box will be retained for overlapping detection boxes, and the external overlapping boxes will be deleted</li>
414417
<li><b>union</b>, no filtering of boxes will be performed, and both internal and external boxes will be retained</li>
418+
<li><b>dict</b>, keys as <b>int</b> representing <code>cls_id</code> and values as merging modes, e.g., <code>{0: "large", 2: "small"}</li>
415419
<li><b>None</b>, not specified, will use the <code>layout_merge_bboxes_mode</code> parameter specified in <code>create_model</code>. If not specified in <code>create_model</code>, the default PaddleX official model configuration will be used</li>
416420
</ul>
417421
</td>

docs/module_usage/tutorials/ocr_modules/layout_detection.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -308,23 +308,25 @@ for res in output:
308308
<tr>
309309
<td><code>layout_unclip_ratio</code></td>
310310
<td>检测框的边长缩放倍数;如果不指定,将默认使用PaddleX官方模型配置</td>
311-
<td><code>float/list/None</code></td>
311+
<td><code>float/list/dict/None</code></td>
312312
<td>
313313
<ul>
314314
<li><b>float</b>, 大于0的浮点数,如 1.1 , 表示将模型输出的检测框中心不变,宽和高都扩张1.1倍</li>
315315
<li><b>列表</b>, 如 [1.2, 1.5] , 表示将模型输出的检测框中心不变,宽度扩张1.2倍,高度扩张1.5倍</li>
316+
<li><b>字典</b>, 字典的key为<b>int</b>类型,代表<code>cls_id</code>, value为<b>tuple</b>类型,如<code>{0: (1.1, 2.0)}</code>, 表示将模型输出的第0类别检测框中心不变,宽度扩张1.1倍,高度扩张2.0倍</li>
316317
<li><b>None</b>, 不指定,将默认使用PaddleX官方模型配置</li>
317318
</ul>
318319
</td>
319320
<tr>
320321
<td><code>layout_merge_bboxes_mode</code></td>
321322
<td>模型输出的检测框的合并处理模式;如果不指定,将默认使用PaddleX官方模型配置</td>
322-
<td><code>string/None</code></td>
323+
<td><code>string/dict/None</code></td>
323324
<td>
324325
<ul>
325326
<li><b>large</b>, 设置为large时,表示在模型输出的检测框中,对于互相重叠包含的检测框,只保留外部最大的框,删除重叠的内部框。</li>
326327
<li><b>small</b>, 设置为small,表示在模型输出的检测框中,对于互相重叠包含的检测框,只保留内部被包含的小框,删除重叠的外部框。</li>
327328
<li><b>union</b>, 不进行框的过滤处理,内外框都保留</li>
329+
<li><b>dict</b>, 字典的key为<b>int</b>类型,代表<code>cls_id</code>, value为<b>str</b>类型, 如<code>{0: "large", 2: "small"}</code>, 表示对第0类别检测框使用large模式,对第2类别检测框使用small模式</li>
328330
<li><b>None</b>, 不指定,将默认使用PaddleX官方模型配置</li>
329331
</ul>
330332
</td>
@@ -402,23 +404,25 @@ for res in output:
402404
<tr>
403405
<td><code>layout_unclip_ratio</code></td>
404406
<td>检测框的边长缩放倍数;如果不指定,将默认使用PaddleX官方模型配置</td>
405-
<td><code>float/list/None</code></td>
407+
<td><code>float/list/dict/None</code></td>
406408
<td>
407409
<ul>
408410
<li><b>float</b>, 大于0的浮点数,如 1.1 , 表示将模型输出的检测框中心不变,宽和高都扩张1.1倍</li>
409411
<li><b>列表</b>, 如 [1.2, 1.5] , 表示将模型输出的检测框中心不变,宽度扩张1.2倍,高度扩张1.5倍</li>
412+
<li><b>字典</b>, 字典的key为<b>int</b>类型,代表<code>cls_id</code>, value为<b>tuple</b>类型,如<code>{0: (1.1, 2.0)}</code>, 表示将模型输出的第0类别检测框中心不变,宽度扩张1.1倍,高度扩张2.0</li>
410413
<li><b>None</b>, 不指定,将默认使用 <code>creat_model</code> 指定的 <code>layout_unclip_ratio</code> 参数,如果 <code>creat_model</code> 也没有指定,则默认使用PaddleX官方模型配置</li>
411414
</ul>
412415
</td>
413416
<tr>
414417
<td><code>layout_merge_bboxes_mode</code></td>
415418
<td>模型输出的检测框的合并处理模式;如果不指定,将默认使用PaddleX官方模型配置</td>
416-
<td><code>string/None</code></td>
419+
<td><code>string/dict/None</code></td>
417420
<td>
418421
<ul>
419422
<li><b>large</b>, 设置为large时,表示在模型输出的检测框中,对于互相重叠包含的检测框,只保留外部最大的框,删除重叠的内部框。</li>
420423
<li><b>small</b>, 设置为small,表示在模型输出的检测框中,对于互相重叠包含的检测框,只保留内部被包含的小框,删除重叠的外部框。</li>
421424
<li><b>union</b>, 不进行框的过滤处理,内外框都保留</li>
425+
<li><b>dict</b>, 字典的key为<b>int</b>类型,代表<code>cls_id</code>, value为<b>str</b>类型, 如<code>{0: "large", 2: "small"}</code>, 表示对第0类别检测框使用large模式,对第2类别检测框使用small模式</li>
422426
<li><b>None</b>, 不指定,将默认使用 <code>creat_model</code> 指定的 <code>layout_merge_bboxes_mode</code> 参数,如果 <code>creat_model</code> 也没有指定,则默认使用PaddleX官方模型配置</li>
423427
</ul>
424428
</td>

docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v3.en.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -595,11 +595,12 @@ The following are the parameters and their descriptions for the `visual_predict(
595595
<tr>
596596
<td><code>layout_unclip_ratio</code></td>
597597
<td>The expansion coefficient for layout detection.</td>
598-
<td><code>float|Tuple[float,float]|None</code></td>
598+
<td><code>float|Tuple[float,float]|dict|None</code></td>
599599
<td>
600600
<ul>
601601
<li><b>float</b>: Any floating-point number greater than <code>0</code>;</li>
602602
<li><b>Tuple[float,float]</b>: The expansion coefficients in the horizontal and vertical directions, respectively;</li>
603+
<li><b>dict</b>, keys as <b>int</b> representing <code>cls_id</code>, values as float scaling factors, e.g., <code>{0: (1.1, 2.0)}</code> means cls_id 0 expanding the width by 1.1 times and the height by 2.0 times while keeping the center unchanged</li>
603604
<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>1.0</code>;</li>
604605
</ul>
605606
</td>
@@ -608,10 +609,11 @@ The following are the parameters and their descriptions for the `visual_predict(
608609
<tr>
609610
<td><code>layout_merge_bboxes_mode</code></td>
610611
<td>The overlapping box filtering method.</td>
611-
<td><code>str|None</code></td>
612+
<td><code>str|dict|None</code></td>
612613
<td>
613614
<ul>
614615
<li><b>str</b>: large, small, union. Respectively representing retaining the large box, small box, or both when filtering overlapping boxes.</li>
616+
<li><b>dict</b>, keys as <b>int</b> representing <code>cls_id</code> and values as merging modes, e.g., <code>{0: "large", 2: "small"}</li>
615617
<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>large</code>;</li>
616618
</ul>
617619
</td>

docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v3.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -593,11 +593,12 @@ PP-ChatOCRv3-doc 预测的流程、API说明、产出说明如下:
593593
<tr>
594594
<td><code>layout_unclip_ratio</code></td>
595595
<td>版面检测扩张系数</td>
596-
<td><code>float|Tuple[float,float]|None</code></td>
596+
<td><code>float|Tuple[float,float]|dict|None</code></td>
597597
<td>
598598
<ul>
599599
<li><b>float</b>:任意大于 <code>0</code> 浮点数;</li>
600600
<li><b>Tuple[float,float]</b>:在横纵两个方向各自的扩张系数;</li>
601+
<li><b>字典</b>, 字典的key为<b>int</b>类型,代表<code>cls_id</code>, value为<b>tuple</b>类型,如<code>{0: (1.1, 2.0)}</code>, 表示将模型输出的第0类别检测框中心不变,宽度扩张1.1倍,高度扩张2.0倍</li>
601602
<li><b>None</b>:如果设置为 <code>None</code>, 将默认使用产线初始化的该参数值,初始化为 <code>1.0</code>;</li>
602603
</ul>
603604
</td>
@@ -606,10 +607,11 @@ PP-ChatOCRv3-doc 预测的流程、API说明、产出说明如下:
606607
<tr>
607608
<td><code>layout_merge_bboxes_mode</code></td>
608609
<td>重叠框过滤方式</td>
609-
<td><code>str|None</code></td>
610+
<td><code>str|dict|None</code></td>
610611
<td>
611612
<ul>
612613
<li><b>str</b>:large,small, union.分别表示重叠框过滤时选择保留大框,小框还是同时保留</li>
614+
<li><b>dict</b>, 字典的key为<b>int</b>类型,代表<code>cls_id</code>, value为<b>str</b>类型, 如<code>{0: "large", 2: "small"}</code>, 表示对第0类别检测框使用large模式,对第2类别检测框使用small模式</li>
613615
<li><b>None</b>:如果设置为 <code>None</code>, 将默认使用产线初始化的该参数值,初始化为 <code>large</code>;</li>
614616
</ul>
615617
</td>

docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v4.en.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -672,11 +672,12 @@ The following are the parameters and descriptions of the `visual_predict()` meth
672672
<tr>
673673
<td><code>layout_unclip_ratio</code></td>
674674
<td>The expansion coefficient for layout detection.</td>
675-
<td><code>float|Tuple[float,float]|None</code></td>
675+
<td><code>float|Tuple[float,float]|dict|None</code></td>
676676
<td>
677677
<ul>
678678
<li><b>float</b>: Any floating-point number greater than <code>0</code>;</li>
679679
<li><b>Tuple[float,float]</b>: The expansion coefficients in the horizontal and vertical directions, respectively;</li>
680+
<li><b>dict</b>, keys as <b>int</b> representing <code>cls_id</code>, values as float scaling factors for each category.</li>
680681
<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>1.0</code>;</li>
681682
</ul>
682683
</td>
@@ -685,10 +686,11 @@ The following are the parameters and descriptions of the `visual_predict()` meth
685686
<tr>
686687
<td><code>layout_merge_bboxes_mode</code></td>
687688
<td>The method for filtering overlapping bounding boxes.</td>
688-
<td><code>str|None</code></td>
689+
<td><code>str|dict|None</code></td>
689690
<td>
690691
<ul>
691692
<li><b>str</b>: large, small, union. Respectively representing retaining the larger box, smaller box, or both when overlapping boxes are filtered.</li>
693+
<li><b>dict</b>, keys as <b>int</b> representing <code>cls_id</code> and values as merging modes for each category.</li>
692694
<li><b>None</b>: If set to <code>None</code>, it will default to the value initialized by the pipeline, initialized to <code>large</code>;</li>
693695
</ul>
694696
</td>

docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v4.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -841,11 +841,12 @@ PP-ChatOCRv4 预测的流程、API说明、产出说明如下:
841841
<tr>
842842
<td><code>layout_unclip_ratio</code></td>
843843
<td>版面检测扩张系数</td>
844-
<td><code>float|Tuple[float,float]|None</code></td>
844+
<td><code>float|Tuple[float,float]|dict|None</code></td>
845845
<td>
846846
<ul>
847847
<li><b>float</b>:任意大于 <code>0</code> 浮点数;</li>
848848
<li><b>Tuple[float,float]</b>:在横纵两个方向各自的扩张系数;</li>
849+
<li><b>字典</b>, 字典的key为<b>int</b>类型,代表<code>cls_id</code>, value为<b>tuple</b>类型,如<code>{0: (1.1, 2.0)}</code>, 表示将模型输出的第0类别检测框中心不变,宽度扩张1.1倍,高度扩张2.0倍</li>
849850
<li><b>None</b>:如果设置为 <code>None</code>, 将默认使用产线初始化的该参数值,初始化为 <code>1.0</code>;</li>
850851
</ul>
851852
</td>
@@ -854,10 +855,11 @@ PP-ChatOCRv4 预测的流程、API说明、产出说明如下:
854855
<tr>
855856
<td><code>layout_merge_bboxes_mode</code></td>
856857
<td>重叠框过滤方式</td>
857-
<td><code>str|None</code></td>
858+
<td><code>str|dict|None</code></td>
858859
<td>
859860
<ul>
860861
<li><b>str</b>:large,small, union.分别表示重叠框过滤时选择保留大框,小框还是同时保留</li>
862+
字典的key为<b>int</b>类型,代表<code>cls_id</code>, value为<b>str</b>类型, 如<code>{0: "large", 2: "small"}</code>, 表示对第0类别检测框使用large模式,对第2类别检测框使用small模式</li>
861863
<li><b>None</b>:如果设置为 <code>None</code>, 将默认使用产线初始化的该参数值,初始化为 <code>large</code>;</li>
862864
</ul>
863865
</td>

paddlex/inference/models/object_detection/predictor.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ def __init__(
5050
img_size: Optional[Union[int, Tuple[int, int]]] = None,
5151
threshold: Optional[Union[float, dict]] = None,
5252
layout_nms: Optional[bool] = None,
53-
layout_unclip_ratio: Optional[Union[float, Tuple[float, float]]] = None,
53+
layout_unclip_ratio: Optional[Union[float, Tuple[float, float], dict]] = None,
5454
layout_merge_bboxes_mode: Optional[Union[str, dict]] = None,
5555
**kwargs,
5656
):
@@ -91,9 +91,11 @@ def __init__(
9191
assert (
9292
len(layout_unclip_ratio) == 2
9393
), f"The length of `layout_unclip_ratio` should be 2."
94+
elif isinstance(layout_unclip_ratio, dict):
95+
pass
9496
else:
9597
raise ValueError(
96-
f"The type of `layout_unclip_ratio` must be float or Tuple[float, float], but got {type(layout_unclip_ratio)}."
98+
f"The type of `layout_unclip_ratio` must be float, Tuple[float, float] or Dict, but got {type(layout_unclip_ratio)}."
9799
)
98100

99101
if layout_merge_bboxes_mode is not None:
@@ -209,7 +211,7 @@ def process(
209211
batch_data: List[Any],
210212
threshold: Optional[Union[float, dict]] = None,
211213
layout_nms: bool = False,
212-
layout_unclip_ratio: Optional[Union[float, Tuple[float, float]]] = None,
214+
layout_unclip_ratio: Optional[Union[float, Tuple[float, float], dict]] = None,
213215
layout_merge_bboxes_mode: Optional[Union[str, dict]] = None,
214216
):
215217
"""

paddlex/inference/models/object_detection/processors.py

Lines changed: 46 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -540,23 +540,48 @@ def unclip_boxes(boxes, unclip_ratio=None):
540540
if unclip_ratio is None:
541541
return boxes
542542

543-
widths = boxes[:, 4] - boxes[:, 2]
544-
heights = boxes[:, 5] - boxes[:, 3]
545-
546-
new_w = widths * unclip_ratio[0]
547-
new_h = heights * unclip_ratio[1]
548-
center_x = boxes[:, 2] + widths / 2
549-
center_y = boxes[:, 3] + heights / 2
550-
551-
new_x1 = center_x - new_w / 2
552-
new_y1 = center_y - new_h / 2
553-
new_x2 = center_x + new_w / 2
554-
new_y2 = center_y + new_h / 2
555-
expanded_boxes = np.column_stack(
556-
(boxes[:, 0], boxes[:, 1], new_x1, new_y1, new_x2, new_y2)
557-
)
543+
if isinstance(unclip_ratio, dict):
544+
expanded_boxes = []
545+
for box in boxes:
546+
class_id, score, x1, y1, x2, y2 = box
547+
if class_id in unclip_ratio:
548+
width_ratio, height_ratio = unclip_ratio[class_id]
549+
550+
width = x2 - x1
551+
height = y2 - y1
552+
553+
new_w = width * width_ratio
554+
new_h = height * height_ratio
555+
center_x = x1 + width / 2
556+
center_y = y1 + height / 2
557+
558+
new_x1 = center_x - new_w / 2
559+
new_y1 = center_y - new_h / 2
560+
new_x2 = center_x + new_w / 2
561+
new_y2 = center_y + new_h / 2
562+
563+
expanded_boxes.append([class_id, score, new_x1, new_y1, new_x2, new_y2])
564+
else:
565+
expanded_boxes.append(box)
566+
return np.array(expanded_boxes)
558567

559-
return expanded_boxes
568+
else:
569+
widths = boxes[:, 4] - boxes[:, 2]
570+
heights = boxes[:, 5] - boxes[:, 3]
571+
572+
new_w = widths * unclip_ratio[0]
573+
new_h = heights * unclip_ratio[1]
574+
center_x = boxes[:, 2] + widths / 2
575+
center_y = boxes[:, 3] + heights / 2
576+
577+
new_x1 = center_x - new_w / 2
578+
new_y1 = center_y - new_h / 2
579+
new_x2 = center_x + new_w / 2
580+
new_y2 = center_y + new_h / 2
581+
expanded_boxes = np.column_stack(
582+
(boxes[:, 0], boxes[:, 1], new_x1, new_y1, new_x2, new_y2)
583+
)
584+
return expanded_boxes
560585

561586

562587
def iou(box1, box2):
@@ -687,8 +712,8 @@ def apply(
687712
img_size: Tuple[int, int],
688713
threshold: Union[float, dict],
689714
layout_nms: Optional[bool],
690-
layout_unclip_ratio: Optional[Union[float, Tuple[float, float]]],
691-
layout_merge_bboxes_mode: Optional[str],
715+
layout_unclip_ratio: Optional[Union[float, Tuple[float, float], dict]],
716+
layout_merge_bboxes_mode: Optional[Union[str, dict]],
692717
) -> Boxes:
693718
"""Apply post-processing to the detection boxes.
694719
@@ -774,9 +799,11 @@ def apply(
774799
assert (
775800
len(layout_unclip_ratio) == 2
776801
), f"The length of `layout_unclip_ratio` should be 2."
802+
elif isinstance(layout_unclip_ratio, dict):
803+
pass
777804
else:
778805
raise ValueError(
779-
f"The type of `layout_unclip_ratio` must be float or Tuple[float, float], but got {type(layout_unclip_ratio)}."
806+
f"The type of `layout_unclip_ratio` must be float, Tuple[float, float] or Dict[int, Tuple[float, float]], but got {type(layout_unclip_ratio)}."
780807
)
781808
boxes = unclip_boxes(boxes, layout_unclip_ratio)
782809

0 commit comments

Comments
 (0)