- π Generate a document parsed results using Document Intelligence, and output it in Markdown format. > output
- πΌοΈ Extract figures from documents and save them as PNG images. > output
- π€ Generate figure descriptions using Azure OpenAI Multimodal.
- π Update markdown outputs with generated descriptions. > output
- π Extract tables and convert them into Excel files. > output
- π Text Chunking to markdown ouputs using
MarkdownHeaderTextSplitter
,RecursiveContentChunker
, andSemanticContentChunker (TBD)
> markdown chuck output | recursive chunk output
python doc_intelli_workflow.py
- π Document Intelligence Official Samples: Python (v4.0) / RAG samples / Figure understanding.
- Build Intelligent RAG for Multimodality and Complex Document Structures
- nohanaga: Document Intelligence Samples | π Output: Article in Japanese
- π 7 Chunking Strategies for Langchain
- MarkdownHeaderTextSplitter
- β‘ Azure Functions vs. Indexers: AI Data Ingestion
- RAG Time: Mastering RAG