Repository: Ashprogrammer29/AI-Powered-PDF-Context-Retrieval-Chatbot-RAG
Unlock intelligent context retrieval and querying from your PDFs using state-of-the-art AI! This Retrieval-Augmented Generation (RAG) chatbot leverages FastAPI, LangChain, Google Gemini, and powerful vector search for document Q&A
File/Folder | Description |
---|---|
PDF Context Retrieval Chatbot.ipynb |
Main Jupyter notebook: code, API, model setup, PDF ingestion, and querying logic |
requirements[1].txt |
Python dependencies (FastAPI, LangChain, vector DB, Google GenAI, etc.) |
LICENSE |
Boost Software License v1.0 (see below) |
Key Notebook Functions:
- PDF upload & text extraction (
get_pdf_text
) - Text chunking & vector store creation (
create_vectorstore
) - API endpoints (
/process
) - Utility scripts: file/folder handling, model config, embedding setup
- Uses FastAPI for serving endpoints
-
Clone the repo
git clone https://github.com/Ashprogrammer29/AI-Powered-PDF-Context-Retrieval-Chatbot-RAG.git cd AI-Powered-PDF-Context-Retrieval-Chatbot-RAG
-
Install dependencies
python3 -m venv venv source venv/bin/activate pip install -r requirements[1].txt
-
Configure Model
- Setup your Google Gemini API key (via notebook or environment)
- Place model configs/weights as needed
- Open
PDF Context Retrieval Chatbot.ipynb
- Step through: mount storage, set up API keys, upload PDFs, run cells for context retrieval & Q&A
- Inside the notebook, the FastAPI app is initialized; you can run locally using:
uvicorn main:app --reload
- Endpoints available for PDF upload & query
- Endpoint:
/process
- Method:
POST
- Payload: JSON (see notebook's
File
model)
{
"files": ["https://example.com/file1.pdf", ...],
"rewrite": true
}
- Use the vectorstore and Q&A logic in the notebook to ask questions about uploaded PDFs.
- LLM: Google Gemini (via
langchain-google-genai
, API key required) - Embeddings: GoogleGenerativeAIEmbeddings + FAISS for semantic search
- PDF Parsing: PyMuPDF (
pymupdf
) - Text Splitting: RecursiveCharacterTextSplitter from LangChain
- API Models: Pydantic-based request bodies
- Configurable: Chunk size, rewrite mode, user/session IDs, etc.
- Upload PDFs via
/process
endpoint or notebook cell - Ask questions!: "What is the summary of the document?" or "Find the legal clause about termination."
- Get answers with full context, citations, and semantic retrieval from your documents.
- Legal Document Q&A ⚖️
Instantly find clauses, obligations, or summaries from contracts and agreements. - Academic Research Assistant 🎓
Extract findings, definitions, and references from research papers. - Business Report Analysis 📊
Query for revenue, trends, and executive summaries in reports. - Technical Manual & FAQ Chatbot 🛠️
Retrieve procedures and troubleshooting from manuals. - Compliance & Policy Checking 🏢
Automate policy and HR queries from company documents. - Customer Support Automation 💬
Answer product, feature, or troubleshooting questions from help docs. - Onboarding & Training 👩💼
Enable instant answers for new employee training materials.
Pull requests, issues, and suggestions are welcome! 🎉
This project is licensed under the Boost Software License - Version 1.0 - August 17th, 2003.
See the LICENSE file for details.
- FastAPI 🚀
- LangChain 🦜
- Google Gemini 🤖
- FAISS/Vector DB 🔍
- PyMuPDF 📄
- Pydantic 🛠️
Big thanks to the open-source AI/NLP community, LangChain, Google, and all contributors!
Made with ❤️ by Ashprogrammer29