-
Notifications
You must be signed in to change notification settings - Fork 135
Open
Description
Description
Hi, I have been using this library. I would like to say thank you for the good work.
I have checked the PDF parsing algorithms we have, but I can't use any of them for some tasks at work.
- PyMuPDF: Licensing issues.
- VLLM models, such as Unitable and Table Transformers. (We are restricted from downloading open-source models at work.)
I have managed to get PDFPlumber working well for the table extraction and OpenParse with PDFMiner.six for text extraction. I like how PDFMiner is extracting the text for my page (it comes with the bold and line breaks).
On the other hand, I like how PDFPlumber is getting the tables.
Can I customize the library to use the two libraries?
Has anyone tried? If yes, what challenges did they face?
I am trying to give it a go this weekend and hopefully will make a PR to the repo with my findings.
Cheers
Metadata
Metadata
Assignees
Labels
No labels