Have we thought about adding pdf plumber for table detection?

### Description

Hi, I have been using this library. I would like to say thank you for the good work.

I have checked the PDF parsing algorithms we have, but I can't use any of them for some tasks at work.

- PyMuPDF: Licensing issues.
- VLLM models, such as Unitable and Table Transformers. (We are restricted from downloading open-source models at work.)

I have managed to get PDFPlumber working well for the table extraction and OpenParse with PDFMiner.six for text extraction. I like how PDFMiner is extracting the text for my page (it comes with the bold and line breaks).

On the other hand, I like how PDFPlumber is getting the tables.

Can I customize the library to use the two libraries? 

Has anyone tried? If yes, what challenges did they face? 

I am trying to give it a go this weekend and hopefully will make a PR to the repo with my findings.

Cheers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Have we thought about adding pdf plumber for table detection? #97

Description

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Have we thought about adding pdf plumber for table detection? #97

Description

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions