An on-premises, OCR-free unstructured data extraction and benchmarking toolkit.
-
Updated
May 13, 2025 - Python
An on-premises, OCR-free unstructured data extraction and benchmarking toolkit.
TWIX is an open-source data extraction tool that reconstructs structured data from documents at scale, accurately and at low cost, by inferring the shared underlying visual template across documents
Add a description, image, and links to the document-data-extraction topic page so that developers can more easily learn about it.
To associate your repository with the document-data-extraction topic, visit your repo's landing page and select "manage topics."