Skip to content

【Hackathon 8th No.41】在 PaddleOCR 中复现 OmniParser 论文 #15582

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

robinbg
Copy link

@robinbg robinbg commented Jun 4, 2025

…nition and KIE

Copy link

paddle-bot bot commented Jun 4, 2025

Thanks for your contribution!

@GreatV GreatV requested review from Topdu, cuicheng01 and Copilot June 4, 2025 23:44
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR integrates the OmniParser unified framework into PaddleOCR by adding end-to-end support for data processing, model definition, inference, postprocessing, and documentation/configuration.

  • Introduces OmniParserPredictor and related inference tooling.
  • Adds unified model components: backbone, FPN neck, pixel/table/KIE heads, and multi-task loss.
  • Implements data augmentations, postprocessing logic, YAML configuration, and documentation for OmniParser.

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tools/infer/predict_omniparser.py Inference script for OmniParser
ppocr/postprocess/omniparser_postprocess.py Postprocessing for text, table, and KIE outputs
ppocr/modeling/heads/omniparser_table_head.py Table structure head
ppocr/modeling/heads/omniparser_pixel_head.py Pixel-level text detection head
ppocr/modeling/heads/omniparser_kie_head.py Key information extraction head
ppocr/modeling/backbones/omniparser_backbone.py OmniParser backbone and FPN fusion
ppocr/modeling/architectures/omniparser.py Unified OmniParser architecture
ppocr/losses/omniparser_loss.py Multi-task loss combining all heads
ppocr/data/imaug/omniparser_process.py Data processing and augmentation for OmniParser
docs/algorithm/omniparser/omniparser_en.md English documentation for OmniParser
docs/algorithm/omniparser/omniparser.md Chinese documentation for OmniParser
configs/omniparser/omniparser_base.yml Configuration file for OmniParser training and inference
Comments suppressed due to low confidence (2)

tools/infer/predict_omniparser.py:74

  • [nitpick] Argument det_model_dir is used both as a directory and a file path. Consider renaming to det_model_path or splitting into separate directory and file parameters for clarity.
model_file_path = args.det_model_dir

configs/omniparser/omniparser_base.yml:52

  • KIEHead.num_classes is set to 10, but the post-process classes list has only 6 entries. Align the number of classes in the head configuration with the post-process class list to avoid mismatches.
    num_classes: 10

Comment on lines +84 to +87
text_score = preds['text_prob'][0, 0].numpy()
center_score = preds['center_prob'][0, 0].numpy()
border_score = preds['border_prob'][0, 0].numpy()

Copy link
Preview

Copilot AI Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling .numpy() on a NumPy array will raise an AttributeError. Ensure preds['text_prob'] is a Paddle tensor before using .numpy(), or remove the .numpy() call and work with the array directly.

Suggested change
text_score = preds['text_prob'][0, 0].numpy()
center_score = preds['center_prob'][0, 0].numpy()
border_score = preds['border_prob'][0, 0].numpy()
# Ensure compatibility with both Paddle tensors and NumPy arrays
text_prob = preds['text_prob'][0, 0]
center_prob = preds['center_prob'][0, 0]
border_prob = preds['border_prob'][0, 0]
text_score = text_prob.numpy() if isinstance(text_prob, paddle.Tensor) else text_prob
center_score = center_prob.numpy() if isinstance(center_prob, paddle.Tensor) else center_prob
border_score = border_prob.numpy() if isinstance(border_prob, paddle.Tensor) else border_prob

Copilot uses AI. Check for mistakes.

def _get_table_boundary(self, boundary_pred):
"""Extract table boundary from prediction"""
# Obtain probability map for boundary
boundary_prob = paddle.nn.functional.softmax(boundary_pred, axis=1)[0, 1].numpy()
Copy link
Preview

Copilot AI Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using paddle.nn.functional.softmax on a NumPy array will fail. Convert boundary_pred to a Paddle tensor or use a NumPy-based softmax implementation to process the array.

Suggested change
boundary_prob = paddle.nn.functional.softmax(boundary_pred, axis=1)[0, 1].numpy()
boundary_pred_tensor = paddle.to_tensor(boundary_pred)
boundary_prob = paddle.nn.functional.softmax(boundary_pred_tensor, axis=1)[0, 1].numpy()

Copilot uses AI. Check for mistakes.

logits.append(logit)
else:
# No regions case
logits.append(paddle.zeros([0, self.num_classes]))
Copy link
Preview

Copilot AI Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attribute self.num_classes is not defined in __init__. Initialize self.num_classes = num_classes to avoid an AttributeError.

Copilot uses AI. Check for mistakes.

Comment on lines +44 to +51
pre_process_list = [{
'OmniParserDataProcess': {
'image_shape': [1024, 1024],
'augmentation': False,
'mean': [0.485, 0.456, 0.406],
'std': [0.229, 0.224, 0.225],
}
}]
Copy link
Preview

Copilot AI Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable pre_process_list is defined but never used. Remove it or integrate it into the preprocessing pipeline to avoid dead code.

Suggested change
pre_process_list = [{
'OmniParserDataProcess': {
'image_shape': [1024, 1024],
'augmentation': False,
'mean': [0.485, 0.456, 0.406],
'std': [0.229, 0.224, 0.225],
}
}]
# Removed unused variable `pre_process_list`.

Copilot uses AI. Check for mistakes.

@luotao1
Copy link
Collaborator

luotao1 commented Jun 10, 2025

请先提交RFC设计文档

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants