Skip to content

AI-Powered PDF Context Retrieval Chatbot (RAG) is a smart chatbot that lets you upload PDFs and ask questions about their content. Using advanced AI and semantic search, it finds and summarizes answers directly from your documents—ideal for legal, academic, business, and support tasks.

License

Notifications You must be signed in to change notification settings

Ashprogrammer29/AI-Powered-PDF-Context-Retrieval-Chatbot-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Banner

AI-Powered PDF Context Retrieval Chatbot (RAG) 🤖📄

Repository: Ashprogrammer29/AI-Powered-PDF-Context-Retrieval-Chatbot-RAG

Unlock intelligent context retrieval and querying from your PDFs using state-of-the-art AI! This Retrieval-Augmented Generation (RAG) chatbot leverages FastAPI, LangChain, Google Gemini, and powerful vector search for document Q&A


🗂️ Files & Structure

File/Folder Description
PDF Context Retrieval Chatbot.ipynb Main Jupyter notebook: code, API, model setup, PDF ingestion, and querying logic
requirements[1].txt Python dependencies (FastAPI, LangChain, vector DB, Google GenAI, etc.)
LICENSE Boost Software License v1.0 (see below)

Key Notebook Functions:

  • PDF upload & text extraction (get_pdf_text)
  • Text chunking & vector store creation (create_vectorstore)
  • API endpoints (/process)
  • Utility scripts: file/folder handling, model config, embedding setup
  • Uses FastAPI for serving endpoints

⚡️ Quick Setup

  1. Clone the repo

    git clone https://github.com/Ashprogrammer29/AI-Powered-PDF-Context-Retrieval-Chatbot-RAG.git
    cd AI-Powered-PDF-Context-Retrieval-Chatbot-RAG
  2. Install dependencies

    python3 -m venv venv
    source venv/bin/activate
    pip install -r requirements[1].txt
  3. Configure Model

    • Setup your Google Gemini API key (via notebook or environment)
    • Place model configs/weights as needed

🚦 Running the Chatbot

📝 Jupyter Notebook

  • Open PDF Context Retrieval Chatbot.ipynb
  • Step through: mount storage, set up API keys, upload PDFs, run cells for context retrieval & Q&A

🌐 API Server (FastAPI)

  • Inside the notebook, the FastAPI app is initialized; you can run locally using:
    uvicorn main:app --reload
  • Endpoints available for PDF upload & query

📡 API Endpoints

1. Upload PDF

  • Endpoint: /process
  • Method: POST
  • Payload: JSON (see notebook's File model)
{
  "files": ["https://example.com/file1.pdf", ...],
  "rewrite": true
}

2. Query Context

  • Use the vectorstore and Q&A logic in the notebook to ask questions about uploaded PDFs.

🧠 Model & Config Notes

  • LLM: Google Gemini (via langchain-google-genai, API key required)
  • Embeddings: GoogleGenerativeAIEmbeddings + FAISS for semantic search
  • PDF Parsing: PyMuPDF (pymupdf)
  • Text Splitting: RecursiveCharacterTextSplitter from LangChain
  • API Models: Pydantic-based request bodies
  • Configurable: Chunk size, rewrite mode, user/session IDs, etc.

📒 Example Usage

  1. Upload PDFs via /process endpoint or notebook cell
  2. Ask questions!: "What is the summary of the document?" or "Find the legal clause about termination."
  3. Get answers with full context, citations, and semantic retrieval from your documents.

🎯 Use Cases

  • Legal Document Q&A ⚖️
    Instantly find clauses, obligations, or summaries from contracts and agreements.
  • Academic Research Assistant 🎓
    Extract findings, definitions, and references from research papers.
  • Business Report Analysis 📊
    Query for revenue, trends, and executive summaries in reports.
  • Technical Manual & FAQ Chatbot 🛠️
    Retrieve procedures and troubleshooting from manuals.
  • Compliance & Policy Checking 🏢
    Automate policy and HR queries from company documents.
  • Customer Support Automation 💬
    Answer product, feature, or troubleshooting questions from help docs.
  • Onboarding & Training 👩‍💼
    Enable instant answers for new employee training materials.

🙌 Contributing

Pull requests, issues, and suggestions are welcome! 🎉


📜 License

This project is licensed under the Boost Software License - Version 1.0 - August 17th, 2003.

See the LICENSE file for details.


💡 Tech Stack

  • FastAPI 🚀
  • LangChain 🦜
  • Google Gemini 🤖
  • FAISS/Vector DB 🔍
  • PyMuPDF 📄
  • Pydantic 🛠️

🌟 Acknowledgements

Big thanks to the open-source AI/NLP community, LangChain, Google, and all contributors!


Made with ❤️ by Ashprogrammer29

About

AI-Powered PDF Context Retrieval Chatbot (RAG) is a smart chatbot that lets you upload PDFs and ask questions about their content. Using advanced AI and semantic search, it finds and summarizes answers directly from your documents—ideal for legal, academic, business, and support tasks.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •