from chat import ask_question
from vector import create_faiss_index
# Load FAISS index
faiss_index = create_faiss_index("index/faiss_index.json")
# Ask a question about the video content
answer = ask_question(faiss_index, "What is the main topic discussed?")
print(answer)text
Answer: The video discusses the advancements in nuclear fusion technology and its potential impact on future energy systems.
This project allows you to:
- Fetch YouTube video transcripts automatically.
- Split the transcript into manageable chunks.
- Generate vector embeddings using HuggingFace models.
- Store and retrieve embeddings using FAISS vector store.
- Query the transcript intelligently with a Groq-powered LLM (
ChatGroq). - Save chat history in JSON format for later analysis.
The system ensures that responses are contextual, using only the transcript text for answering questions.
- Automatic transcript fetching from YouTube videos.
- Recursive text splitting for long transcripts.
- FAISS-based semantic search and retrieval.
- HuggingFace
all-MiniLM-L6-v2embeddings for semantic encoding. - Integration with Groq LLM (
llama-3.1-8b-instant) for question answering. - Logging of all user/AI interactions to
chat_history.json.
- Python 3.10+
- YouTube Transcript API – fetch video transcripts.
- LangChain Community Libraries
RecursiveCharacterTextSplitterFAISSvector storeHuggingFaceEmbeddingsChatGroq(Groq LLM)
- FAISS – fast similarity search for embeddings.
- HuggingFace Embeddings –
all-MiniLM-L6-v2semantic embeddings. - Groq Cloud – access Groq LLMs via API key.
git clone <repo-url>
cd <repo-folder>
pip install youtube-transcript-api langchain_text_splitters langchain_community langchain_groq- Sign up at Groq Cloud.
- Generate an API key from your dashboard.
- When running the script, you will be prompted to input your Groq API key:
import os, getpass
if "GROQ_API_KEY" not in os.environ:
os.environ["GROQ_API_KEY"] = getpass.getpass("Enter your Groq API key: ")python main.py- Fetch a transcript from a YouTube video by providing the video ID.
- The transcript will be split, embedded, and stored in FAISS.
- You can query the video using natural language questions.
- Chat interactions are automatically logged to
chat_history.json.
from main import main_chain, invoke_with_logging
response = invoke_with_logging(main_chain, "Can you summarize the video in less than 20 words?") print(response)
transcript_snippets.json– raw transcript snippets.script.json– full transcript text.chunks.json– transcript split into chunks for embeddings.faiss_index/– saved FAISS index for semantic search.retriever_config.json– retriever settings.chat_history.json– logged conversation history.
- The project uses FAISS for fast retrieval and HuggingFace embeddings for semantic similarity.
- Groq LLM ensures high-quality question answering on video transcripts.
- All chat history is saved locally in JSON format for reproducibility.
- System is robust to videos without captions (
TranscriptsDisabledhandled).
MIT License