-
Notifications
You must be signed in to change notification settings - Fork 8.4k
【Hackathon 8th No.41】在 PaddleOCR 中复现 OmniParser 论文 #15582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR integrates the OmniParser unified framework into PaddleOCR by adding end-to-end support for data processing, model definition, inference, postprocessing, and documentation/configuration.
- Introduces
OmniParserPredictor
and related inference tooling. - Adds unified model components: backbone, FPN neck, pixel/table/KIE heads, and multi-task loss.
- Implements data augmentations, postprocessing logic, YAML configuration, and documentation for OmniParser.
Reviewed Changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
tools/infer/predict_omniparser.py | Inference script for OmniParser |
ppocr/postprocess/omniparser_postprocess.py | Postprocessing for text, table, and KIE outputs |
ppocr/modeling/heads/omniparser_table_head.py | Table structure head |
ppocr/modeling/heads/omniparser_pixel_head.py | Pixel-level text detection head |
ppocr/modeling/heads/omniparser_kie_head.py | Key information extraction head |
ppocr/modeling/backbones/omniparser_backbone.py | OmniParser backbone and FPN fusion |
ppocr/modeling/architectures/omniparser.py | Unified OmniParser architecture |
ppocr/losses/omniparser_loss.py | Multi-task loss combining all heads |
ppocr/data/imaug/omniparser_process.py | Data processing and augmentation for OmniParser |
docs/algorithm/omniparser/omniparser_en.md | English documentation for OmniParser |
docs/algorithm/omniparser/omniparser.md | Chinese documentation for OmniParser |
configs/omniparser/omniparser_base.yml | Configuration file for OmniParser training and inference |
Comments suppressed due to low confidence (2)
tools/infer/predict_omniparser.py:74
- [nitpick] Argument
det_model_dir
is used both as a directory and a file path. Consider renaming todet_model_path
or splitting into separate directory and file parameters for clarity.
model_file_path = args.det_model_dir
configs/omniparser/omniparser_base.yml:52
KIEHead.num_classes
is set to 10, but the post-processclasses
list has only 6 entries. Align the number of classes in the head configuration with the post-process class list to avoid mismatches.
num_classes: 10
text_score = preds['text_prob'][0, 0].numpy() | ||
center_score = preds['center_prob'][0, 0].numpy() | ||
border_score = preds['border_prob'][0, 0].numpy() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling .numpy()
on a NumPy array will raise an AttributeError. Ensure preds['text_prob']
is a Paddle tensor before using .numpy()
, or remove the .numpy()
call and work with the array directly.
text_score = preds['text_prob'][0, 0].numpy() | |
center_score = preds['center_prob'][0, 0].numpy() | |
border_score = preds['border_prob'][0, 0].numpy() | |
# Ensure compatibility with both Paddle tensors and NumPy arrays | |
text_prob = preds['text_prob'][0, 0] | |
center_prob = preds['center_prob'][0, 0] | |
border_prob = preds['border_prob'][0, 0] | |
text_score = text_prob.numpy() if isinstance(text_prob, paddle.Tensor) else text_prob | |
center_score = center_prob.numpy() if isinstance(center_prob, paddle.Tensor) else center_prob | |
border_score = border_prob.numpy() if isinstance(border_prob, paddle.Tensor) else border_prob | |
Copilot uses AI. Check for mistakes.
def _get_table_boundary(self, boundary_pred): | ||
"""Extract table boundary from prediction""" | ||
# Obtain probability map for boundary | ||
boundary_prob = paddle.nn.functional.softmax(boundary_pred, axis=1)[0, 1].numpy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using paddle.nn.functional.softmax
on a NumPy array will fail. Convert boundary_pred
to a Paddle tensor or use a NumPy-based softmax implementation to process the array.
boundary_prob = paddle.nn.functional.softmax(boundary_pred, axis=1)[0, 1].numpy() | |
boundary_pred_tensor = paddle.to_tensor(boundary_pred) | |
boundary_prob = paddle.nn.functional.softmax(boundary_pred_tensor, axis=1)[0, 1].numpy() |
Copilot uses AI. Check for mistakes.
logits.append(logit) | ||
else: | ||
# No regions case | ||
logits.append(paddle.zeros([0, self.num_classes])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attribute self.num_classes
is not defined in __init__
. Initialize self.num_classes = num_classes
to avoid an AttributeError.
Copilot uses AI. Check for mistakes.
pre_process_list = [{ | ||
'OmniParserDataProcess': { | ||
'image_shape': [1024, 1024], | ||
'augmentation': False, | ||
'mean': [0.485, 0.456, 0.406], | ||
'std': [0.229, 0.224, 0.225], | ||
} | ||
}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable pre_process_list
is defined but never used. Remove it or integrate it into the preprocessing pipeline to avoid dead code.
pre_process_list = [{ | |
'OmniParserDataProcess': { | |
'image_shape': [1024, 1024], | |
'augmentation': False, | |
'mean': [0.485, 0.456, 0.406], | |
'std': [0.229, 0.224, 0.225], | |
} | |
}] | |
# Removed unused variable `pre_process_list`. |
Copilot uses AI. Check for mistakes.
请先提交RFC设计文档 |
…nition and KIE