A collection of examples demonstrating various LangGraph patterns and workflows with open source models.
File: langgraph_simple_rag.py
A complete example of building a RAG system using LangGraph with 100% open source models and GPU acceleration. No API keys required!
Features:
- GPU-accelerated embeddings using sentence-transformers
- Open source LLM (Microsoft Phi-2, 2.7B parameters)
- FAISS vector store for fast similarity search
- Stateful workflow orchestration with LangGraph
- Automatic device detection (CUDA GPU / Apple Silicon / CPU)
Key Concepts:
- State management with TypedDict
- Sequential node execution (retrieve β generate)
- Integration with HuggingFace transformers
- GPU optimization with torch.float16
File: langgraph_simple_chatbot.ipynb
A basic chatbot implementation using LangGraph.
File: langgraph_simple_workflow.ipynb
Demonstrates basic workflow patterns in LangGraph.
- Python 3.10 or higher
- GPU recommended (NVIDIA CUDA or Apple Silicon)
- CPU-only mode works but will be slower
- For NVIDIA: ~6GB VRAM for Phi-2 model
- Alternatives: Use TinyLlama (1.1B) for lower VRAM
- Clone the repository:
git clone <repository-url>
cd langGraph_examples- Install dependencies:
With GPU (CUDA):
# Install PyTorch with CUDA support first
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# Then install other dependencies
uv syncWith CPU only:
uv syncFor Apple Silicon (M1/M2/M3):
# PyTorch with MPS support
uv syncSimply run:
python langgraph_simple_rag.pyNo API keys needed! The script will:
- Detect your GPU (or fallback to CPU)
- Download models automatically on first run (~5GB total)
- Create embeddings for sample documents
- Process example questions through the RAG pipeline
On the first execution, models will be downloaded from HuggingFace Hub:
- all-MiniLM-L6-v2 (embeddings): ~90MB
- microsoft/phi-2 (LLM): ~5.5GB
These are cached locally for future runs.
π₯ GPU-Accelerated RAG with Open Source Models π₯
This example uses:
β’ Embeddings: all-MiniLM-L6-v2 (sentence-transformers)
β’ LLM: Microsoft Phi-2 (2.7B parameters)
β’ Vector Store: FAISS
β’ Orchestration: LangGraph
======================================================================
Simple RAG with LangGraph - Open Source LLM Edition (GPU Accelerated)
======================================================================
π Using GPU: NVIDIA GeForce RTX 3080
1. Loading embeddings model...
β Embeddings model loaded on cuda
2. Creating vector store...
π Creating vector store and generating embeddings...
β Vector store created and populated
3. Loading open source LLM...
π₯ Loading LLM: microsoft/phi-2...
β LLM loaded on cuda
4. Creating RAG workflow graph...
β Graph created
======================================================================
Question 1: What is LangGraph?
======================================================================
π Retrieving documents for: What is LangGraph?
β Retrieved 3 documents
π€ Generating answer with open source LLM...
β Answer generated
π Answer: [Generated answer about LangGraph]
The GPU-accelerated RAG pipeline:
User Question
β
[Embeddings Model] β Convert question to vector (GPU)
β
[Vector Store] β Retrieve top-k similar documents (FAISS)
β
[LLM Model] β Generate answer from context (GPU)
β
Answer
Performance:
- GPU (CUDA): ~2-5 seconds per question
- CPU: ~15-30 seconds per question
Edit the create_llm() function in langgraph_simple_rag.py:
def create_llm(device="cuda"):
# Choose your model:
# Fast & lightweight (1.1B params, ~2GB VRAM)
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
# Balanced (2.7B params, ~6GB VRAM) - DEFAULT
# model_name = "microsoft/phi-2"
# Better quality (7B params, ~14GB VRAM)
# model_name = "mistralai/Mistral-7B-Instruct-v0.2"
# Very fast encoder-decoder (250M params, ~1GB VRAM)
# model_name = "google/flan-t5-base"Edit the create_embeddings() function:
embeddings = HuggingFaceEmbeddings(
# Fast and efficient - DEFAULT
model_name="all-MiniLM-L6-v2",
# Higher quality (slower)
# model_name="sentence-transformers/all-mpnet-base-v2",
# Multilingual support
# model_name="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
)Modify the SAMPLE_DOCUMENTS list:
SAMPLE_DOCUMENTS = [
"Your first document here...",
"Your second document here...",
# Add more documents
]Or load from files:
from langchain_community.document_loaders import TextLoader
loader = TextLoader("your_document.txt")
documents = loader.load()# Retrieve more documents for better context
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
# Use different search types
retriever = vectorstore.as_retriever(
search_type="mmr", # Maximum Marginal Relevance
search_kwargs={"k": 5, "fetch_k": 10}
)Edit the pipeline parameters in create_llm():
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=512, # Longer responses
temperature=0.7, # More creative (0.0 = deterministic)
top_p=0.9, # Nucleus sampling
repetition_penalty=1.2, # Reduce repetition
)| Model | VRAM Required | Speed (per question) |
|---|---|---|
| TinyLlama-1.1B | ~2GB | ~1-2 seconds |
| Phi-2 (2.7B) | ~6GB | ~2-5 seconds |
| Mistral-7B | ~14GB | ~5-10 seconds |
Works with MPS (Metal Performance Shaders) backend. Performance comparable to mid-range NVIDIA GPUs.
All models work on CPU but will be significantly slower (10-30x). Recommended for testing only.
Reduce model size or batch size:
# Use a smaller model
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
# Or reduce max tokens
max_new_tokens=128Check PyTorch installation:
import torch
print(torch.cuda.is_available())
print(torch.version.cuda)Reinstall PyTorch with CUDA:
pip install torch --index-url https://download.pytorch.org/whl/cu118Models are downloaded from HuggingFace Hub. First run will take time. Use a different mirror if needed:
export HF_ENDPOINT=https://hf-mirror.com- langgraph: Stateful workflow orchestration
- langchain-community: Community integrations (vector stores, embeddings)
- langchain-text-splitters: Document chunking utilities
- faiss-cpu: Fast vector similarity search
- torch: PyTorch for deep learning and GPU acceleration
- transformers: HuggingFace transformers library
- sentence-transformers: Embedding models
- accelerate: Distributed and mixed-precision training
MIT