🤖 CodeRAG: AI-Powered Code Retrieval & Assistance

Note: This POC was innovative for its time, but modern tools like Cursor and Windsurf now apply this principle directly in IDEs. This remains an excellent educational project for understanding RAG implementation.

✨ What is CodeRAG?

CodeRAG combines Retrieval-Augmented Generation (RAG) with AI to provide intelligent coding assistance. Instead of limited context windows, it indexes your entire codebase and provides contextual suggestions based on your complete project.

🎯 Core Idea

Most coding assistants work with limited scope, but CodeRAG provides the full context of your project by:

Real-time indexing of your entire codebase using FAISS vector search
Semantic code search powered by OpenAI embeddings
Contextual AI responses that understand your project structure

🚀 Quick Start

Prerequisites

Python 3.11+
OpenAI API Key (Get one here)

Installation

# Clone the repository
git clone https://github.com/your-username/CodeRAG.git
cd CodeRAG

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\\Scripts\\activate

# Install dependencies
pip install -r requirements.txt

# Configure environment
cp example.env .env
# Edit .env with your OpenAI API key and settings

Configuration

Create a .env file with your settings:

OPENAI_API_KEY=your_openai_api_key_here
OPENAI_EMBEDDING_MODEL=text-embedding-ada-002
OPENAI_CHAT_MODEL=gpt-4
WATCHED_DIR=/path/to/your/code/directory
FAISS_INDEX_FILE=./coderag_index.faiss
EMBEDDING_DIM=1536

Running CodeRAG

# Start the backend (indexing and monitoring)
python main.py

# In a separate terminal, start the web interface
streamlit run app.py

📖 How It Works

graph LR
    A[Code Files] --> B[File Monitor]
    B --> C[OpenAI Embeddings]
    C --> D[FAISS Vector DB]
    E[User Query] --> F[Semantic Search]
    D --> F
    F --> G[Retrieved Context]
    G --> H[OpenAI GPT]
    H --> I[AI Response]

Indexing: CodeRAG monitors your code directory and generates embeddings for Python files
Storage: Embeddings are stored in a FAISS vector database with metadata
Search: User queries are embedded and matched against the code database
Generation: Retrieved code context is sent to GPT models for intelligent responses

🛠️ Architecture

CodeRAG/
├── 🧠 coderag/           # Core RAG functionality  
│   ├── config.py         # Environment configuration
│   ├── embeddings.py     # OpenAI embedding generation
│   ├── index.py          # FAISS vector operations
│   ├── search.py         # Semantic code search
│   └── monitor.py        # File system monitoring
├── 🌐 app.py            # Streamlit web interface
├── 🔧 main.py           # Backend indexing service
├── 🔗 prompt_flow.py    # RAG pipeline orchestration
└── 📋 requirements.txt   # Dependencies

Key Components

🔍 Vector Search: FAISS-powered similarity search for code retrieval
🎯 Smart Embeddings: OpenAI embeddings capture semantic code meaning
📡 Real-time Updates: Watchdog monitors file changes for live indexing
💬 Conversational UI: Streamlit interface with chat-like experience

🎪 Usage Examples

Ask About Your Code

"How does the FAISS indexing work in this codebase?"
"Where is error handling implemented?"
"Show me examples of the embedding generation process"

Get Improvements

"How can I optimize the search performance?"
"What are potential security issues in this code?"
"Suggest better error handling for the monitor module"

Debug Issues

"Why might the search return no results?"  
"How do I troubleshoot OpenAI connection issues?"
"What could cause indexing to fail?"

⚙️ Development

Code Quality Tools

# Install pre-commit hooks
pip install pre-commit
pre-commit install

# Run formatting and linting
black .
flake8 .
mypy .

Testing

# Test FAISS index functionality
python tests/test_faiss.py

# Test individual components
python scripts/initialize_index.py
python scripts/run_monitor.py

🐛 Troubleshooting

Common Issues

Search returns no results

Check if indexing completed: look for coderag_index.faiss file
Verify OpenAI API key is working
Ensure your query relates to indexed Python files

OpenAI API errors

Verify API key in .env file
Check API usage limits and billing
Ensure model names are correct (gpt-4, text-embedding-ada-002)

File monitoring not working

Check WATCHED_DIR path in .env
Ensure directory contains .py files
Look for error logs in console output

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes with proper error handling and type hints
Run code quality checks (pre-commit run --all-files)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

🙏 Acknowledgments

OpenAI for embedding and chat models
Facebook AI Similarity Search (FAISS) for vector search
Streamlit for the web interface
Watchdog for file monitoring

⭐ If this project helps you, please give it a star!

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github		.github
coderag		coderag
scripts		scripts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
DEVELOPMENT.md		DEVELOPMENT.md
LICENSE-2.0.txt		LICENSE-2.0.txt
README.md		README.md
app.py		app.py
example.env		example.env
main.py		main.py
prompt_flow.py		prompt_flow.py
pyproject.toml		pyproject.toml
readme.rst		readme.rst
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🤖 CodeRAG: AI-Powered Code Retrieval & Assistance

✨ What is CodeRAG?

🎯 Core Idea

🚀 Quick Start

Prerequisites

Installation

Configuration

Running CodeRAG

📖 How It Works

🛠️ Architecture

Key Components

🎪 Usage Examples

Ask About Your Code

Get Improvements

Debug Issues

⚙️ Development

Code Quality Tools

Testing

🐛 Troubleshooting

Common Issues

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Neverdecel/CodeRAG

Folders and files

Latest commit

History

Repository files navigation

🤖 CodeRAG: AI-Powered Code Retrieval & Assistance

✨ What is CodeRAG?

🎯 Core Idea

🚀 Quick Start

Prerequisites

Installation

Configuration

Running CodeRAG

📖 How It Works

🛠️ Architecture

Key Components

🎪 Usage Examples

Ask About Your Code

Get Improvements

Debug Issues

⚙️ Development

Code Quality Tools

Testing

🐛 Troubleshooting

Common Issues

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages