This project is a PDF-based RAG pipeline built with LangChain, Ollama, ChromaDB, and ElevenLabs. It lets you load PDFs, split them into chunks, attach metadata, store them in a vector database, and query them using a multi-query retrieval strategy — with optional text-to-speech output.
- PDF ingestion via
PDFPlumberLoader
- Text chunking with overlap using
RecursiveCharacterTextSplitter
- Metadata tagging (title, author, date) for filtering and better retrieval
- Vector storage in ChromaDB
- Multi-query retriever to improve recall
- LLM querying using Ollama (default:
llama3.2
) - Text-to-speech output via ElevenLabs
# Clone the repository
git clone https://github.com/Sourav01112/voice-rag-pdf-assistant.git
cd voice-rag-pdf-assistant
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Create .env file for ElevenLabs API
echo "ELEVENLABS_API_KEY=your_api_key_here" > .env
voice-rag-pdf-assistant/
│── data/ # PDF files go here
│── db/vector_db/ # ChromaDB persistence
│── rag_system.py # Main RAG pipeline
│── requirements.txt
│── README.md
python rag_system.py
-
Place your
.pdf
files inside thedata/
folder. -
The system will:
- Load PDFs
- Split into text chunks
- Add metadata
- Store embeddings in ChromaDB
- Run a multi-query retrieval chain
- Answer your question
- Optionally convert the answer to speech
Loading PDF files...
Processing PDF file: report.pdf
Pages loaded: 12
Splitting text into chunks...
Created 45 text chunks
Adding metadata...
Setting up vector database...
Vector database created and populated
Setting up retrieval chain...
Retrieval chain setup complete
Query: Does the document mention any specific technologies?
Response: Yes, it discusses blockchain and AI applications in the financial sector.
Converting to speech...
- Python 3.9+
- Ollama installed locally
- ElevenLabs API key (optional for TTS)