Tetaş AI Chatbot is an intelligent assistant designed to help users easily access product information from a set of Turkish PDF catalogs. Using OCR, chunking, vector embeddings, and OpenAI's GPT model, it enables natural language interaction with technical product data.
- ✏️ Extracts text from multiple Turkish PDF catalogs using OCR
- 📂 Splits text into meaningful chunks for accurate retrieval
- 🧠 Embeds chunks into a FAISS vector store for semantic search
- 🤖 Uses GPT-3.5-turbo to answer questions based on relevant document chunks
- 🛍️ Provides a Streamlit-based web interface for interactive Q&A
- Python 3.11
- LangChain
- OpenAI (GPT-3.5 Turbo)
- FAISS (Facebook AI Similarity Search)
- PyMuPDF + pytesseract for OCR
- Streamlit for the web interface
- dotenv for secure API key management
- Clone the Repository
git clone https://github.com/BartugKaan/tetasChatbot.git
cd tetasChatBot
- Install Requirements
pip install -r requirements.txt
If you don’t have FAISS installed, you may need:
pip install faiss-cpu # or faiss-gpu if supported
- Add Your OpenAI API Key Create a .env file in the root directory:
OPENAI_API_KEY=your_openai_key_here
Step 1: Process the PDFs (one-time)
python main.py
This step will extract text, chunk it, embed it, and save the FAISS vector store.
Step 2: Run the Chatbot Web App
streamlit run app.py
Go to http://localhost:8501 and start asking questions!
This project was developed as part of an AI engineering practice to explore Retrieval-Augmented Generation (RAG) pipelines using LangChain, OpenAI, and PDF data. Also special thanks for Tetaş for PDF Catalogs.
MIT License