LangGraph Examples

A collection of examples demonstrating various LangGraph patterns and workflows with open source models.

Examples

1. GPU-Accelerated RAG (Retrieval-Augmented Generation)

File: langgraph_simple_rag.py

A complete example of building a RAG system using LangGraph with 100% open source models and GPU acceleration. No API keys required!

Features:

GPU-accelerated embeddings using sentence-transformers
Open source LLM (Microsoft Phi-2, 2.7B parameters)
FAISS vector store for fast similarity search
Stateful workflow orchestration with LangGraph
Automatic device detection (CUDA GPU / Apple Silicon / CPU)

Key Concepts:

State management with TypedDict
Sequential node execution (retrieve → generate)
Integration with HuggingFace transformers
GPU optimization with torch.float16

2. Simple Chatbot

File: langgraph_simple_chatbot.ipynb

A basic chatbot implementation using LangGraph.

3. Simple Workflow

File: langgraph_simple_workflow.ipynb

Demonstrates basic workflow patterns in LangGraph.

Setup

Prerequisites

Python 3.10 or higher
GPU recommended (NVIDIA CUDA or Apple Silicon)
- CPU-only mode works but will be slower
- For NVIDIA: ~6GB VRAM for Phi-2 model
- Alternatives: Use TinyLlama (1.1B) for lower VRAM

Installation

Clone the repository:

git clone <repository-url>
cd langGraph_examples

Install dependencies:

With GPU (CUDA):

# Install PyTorch with CUDA support first
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Then install other dependencies
uv sync

With CPU only:

uv sync

For Apple Silicon (M1/M2/M3):

# PyTorch with MPS support
uv sync

Running the RAG Example

Quick Start

Simply run:

python langgraph_simple_rag.py

No API keys needed! The script will:

Detect your GPU (or fallback to CPU)
Download models automatically on first run (~5GB total)
Create embeddings for sample documents
Process example questions through the RAG pipeline

First Run

On the first execution, models will be downloaded from HuggingFace Hub:

all-MiniLM-L6-v2 (embeddings): ~90MB
microsoft/phi-2 (LLM): ~5.5GB

These are cached locally for future runs.

Expected Output

🔥 GPU-Accelerated RAG with Open Source Models 🔥

This example uses:
  • Embeddings: all-MiniLM-L6-v2 (sentence-transformers)
  • LLM: Microsoft Phi-2 (2.7B parameters)
  • Vector Store: FAISS
  • Orchestration: LangGraph

======================================================================
Simple RAG with LangGraph - Open Source LLM Edition (GPU Accelerated)
======================================================================
🚀 Using GPU: NVIDIA GeForce RTX 3080

1. Loading embeddings model...
✓ Embeddings model loaded on cuda

2. Creating vector store...
🔄 Creating vector store and generating embeddings...
✓ Vector store created and populated

3. Loading open source LLM...
📥 Loading LLM: microsoft/phi-2...
✓ LLM loaded on cuda

4. Creating RAG workflow graph...
✓ Graph created

======================================================================
Question 1: What is LangGraph?
======================================================================

📚 Retrieving documents for: What is LangGraph?
✓ Retrieved 3 documents

🤖 Generating answer with open source LLM...
✓ Answer generated

📝 Answer: [Generated answer about LangGraph]

How the RAG Example Works

The GPU-accelerated RAG pipeline:

User Question
     ↓
[Embeddings Model] → Convert question to vector (GPU)
     ↓
[Vector Store] → Retrieve top-k similar documents (FAISS)
     ↓
[LLM Model] → Generate answer from context (GPU)
     ↓
   Answer

Performance:

GPU (CUDA): ~2-5 seconds per question
CPU: ~15-30 seconds per question

Customization

Using Different Open Source Models

Edit the create_llm() function in langgraph_simple_rag.py:

def create_llm(device="cuda"):
    # Choose your model:

    # Fast & lightweight (1.1B params, ~2GB VRAM)
    model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

    # Balanced (2.7B params, ~6GB VRAM) - DEFAULT
    # model_name = "microsoft/phi-2"

    # Better quality (7B params, ~14GB VRAM)
    # model_name = "mistralai/Mistral-7B-Instruct-v0.2"

    # Very fast encoder-decoder (250M params, ~1GB VRAM)
    # model_name = "google/flan-t5-base"

Using Different Embedding Models

Edit the create_embeddings() function:

embeddings = HuggingFaceEmbeddings(
    # Fast and efficient - DEFAULT
    model_name="all-MiniLM-L6-v2",

    # Higher quality (slower)
    # model_name="sentence-transformers/all-mpnet-base-v2",

    # Multilingual support
    # model_name="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
)

Adding Your Own Documents

Modify the SAMPLE_DOCUMENTS list:

SAMPLE_DOCUMENTS = [
    "Your first document here...",
    "Your second document here...",
    # Add more documents
]

Or load from files:

from langchain_community.document_loaders import TextLoader

loader = TextLoader("your_document.txt")
documents = loader.load()

Adjusting Retrieval Parameters

# Retrieve more documents for better context
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

# Use different search types
retriever = vectorstore.as_retriever(
    search_type="mmr",  # Maximum Marginal Relevance
    search_kwargs={"k": 5, "fetch_k": 10}
)

Tuning LLM Generation

Edit the pipeline parameters in create_llm():

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,        # Longer responses
    temperature=0.7,           # More creative (0.0 = deterministic)
    top_p=0.9,                # Nucleus sampling
    repetition_penalty=1.2,   # Reduce repetition
)

GPU Requirements & Recommendations

NVIDIA GPUs (CUDA)

Model	VRAM Required	Speed (per question)
TinyLlama-1.1B	~2GB	~1-2 seconds
Phi-2 (2.7B)	~6GB	~2-5 seconds
Mistral-7B	~14GB	~5-10 seconds

Apple Silicon (M1/M2/M3)

Works with MPS (Metal Performance Shaders) backend. Performance comparable to mid-range NVIDIA GPUs.

CPU-Only

All models work on CPU but will be significantly slower (10-30x). Recommended for testing only.

Troubleshooting

Out of Memory Error

Reduce model size or batch size:

# Use a smaller model
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

# Or reduce max tokens
max_new_tokens=128

CUDA Not Available

Check PyTorch installation:

import torch
print(torch.cuda.is_available())
print(torch.version.cuda)

Reinstall PyTorch with CUDA:

pip install torch --index-url https://download.pytorch.org/whl/cu118

Models Download Slowly

Models are downloaded from HuggingFace Hub. First run will take time. Use a different mirror if needed:

export HF_ENDPOINT=https://hf-mirror.com

Dependencies

langgraph: Stateful workflow orchestration
langchain-community: Community integrations (vector stores, embeddings)
langchain-text-splitters: Document chunking utilities
faiss-cpu: Fast vector similarity search
torch: PyTorch for deep learning and GPU acceleration
transformers: HuggingFace transformers library
sentence-transformers: Embedding models
accelerate: Distributed and mixed-precision training

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
README.md		README.md
langgraph_simple_chatbot.ipynb		langgraph_simple_chatbot.ipynb
langgraph_simple_rag.py		langgraph_simple_rag.py
langgraph_simple_workflow.ipynb		langgraph_simple_workflow.ipynb
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

waghts95/langGraph_examples

Folders and files

Latest commit

History

Repository files navigation

LangGraph Examples

Examples

1. GPU-Accelerated RAG (Retrieval-Augmented Generation)

2. Simple Chatbot

3. Simple Workflow

Setup

Prerequisites

Installation

Running the RAG Example

Quick Start

First Run

Expected Output

How the RAG Example Works

Customization

Using Different Open Source Models

Using Different Embedding Models

Adding Your Own Documents

Adjusting Retrieval Parameters

Tuning LLM Generation

GPU Requirements & Recommendations

NVIDIA GPUs (CUDA)

Apple Silicon (M1/M2/M3)

CPU-Only

Troubleshooting

Out of Memory Error

CUDA Not Available

Models Download Slowly

Dependencies

Learn More

LangGraph & LangChain

Open Source Models

Vector Databases

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages