Skip to content

hpbyte/h-codex

Repository files navigation

h-codex

A semantic code search tool for intelligent, cross-repo context retrieval.

✨ Features

  • AST-Based Chunking: Intelligent code parsing using Abstract Syntax Trees for optimal chunk boundaries
  • Embedding & Semantic Search: Using OpenAI's text-embedding-3-small model (support for voyage-code-3 planned)
  • Vector Database: PostgreSQL with pgvector extension for efficient similarity search
  • Multi-Language Support: TypeScript, JavaScript, and extensible for other languages
  • Multi-Project Support: Index and search multiple projects
  • MCP Integration: Seamlessly connects with AI coding assistants through Model Context Protocol

🚀 Demo

demo

💻 Getting Started

h-codex can be integrated with AI assistants through the Model Context Protocol.

Example with Claude Desktop

Edit your claude_mcp_settings.json file:

{
  "mcpServers": {
    "h-codex": {
      "command": "npx",
      "args": ["@hpbyte/h-codex-mcp"],
      "env": {
        "LLM_API_KEY": "your_llm_api_key_here", 
        "LLM_BASE_URL": "your_llm_base_url_here (default is openai baseurl: https://api.openai.com/v1)",
        "DB_CONNECTION_STRING": "postgresql://postgres:password@localhost:5432/h-codex"
      }
    }
  }
}

🛠️ Development

Prerequisites

  • Node.js (v18+)
  • pnpm - Package manager
  • Docker - For running PostgreSQL with pgvector
  • OpenAI API key for embeddings

Getting Started

  1. Clone the repository

    git clone https://github.com/hpbyte/h-codex.git
    cd h-codex
  2. Set up environment variables

    cp packages/core/.env.example packages/core/.env

    Edit the .env file with your OpenAI API key and other configuration options.

  3. Install dependencies

    pnpm install
  4. Start PostgreSQL database

    cd dev && docker compose up -d
  5. Set up the database

    pnpm run db:migrate
  6. Start development server

    pnpm dev

🔧 Configuration Options

Environment Variable Description Default
LLM_API_KEY LLM API key for embeddings Required
LLM_BASE_URL LLM Base url key for embeddings https://api.openai.com/v1
EMBEDDING_MODEL OpenAI model for embeddings text-embedding-3-small
CHUNK_SIZE Maximum chunk size in characters 1000
SEARCH_RESULTS_LIMIT Max search results returned 10
SIMILARITY_THRESHOLD Minimum similarity for results 0.5
DB_CONNECTION_STRING PostgreSQL connection string postgresql://postgres:password@localhost:5432/h-codex

🏗️ Architecture

graph TD
    subgraph "Core Package"
        subgraph "Ingestion Pipeline"
            Explorer["Explorer<br/>(file discovery)"]
            Chunker["Chunker<br/>(AST parsing & chunking)"]
            Embedder["Embedder<br/>(semantic embeddings)"]
            Indexer["Indexer<br/>(orchestration)"]

            Explorer --> Chunker
            Chunker --> Embedder
            Embedder --> Indexer
        end

        subgraph "Storage Layer"
            Repository["Repository"]
        end

        Indexer --> Repository
        Repository --> Database[(PostgreSQL Vector Database)]
    end

    subgraph "MCP Package"
        MCPServer["MCP Server"]
        CodeIndexTool["Code Index Tool"]
        CodeSearchTool["Code Search Tool"]

        MCPServer --> CodeIndexTool
        MCPServer --> CodeSearchTool
    end

    CodeIndexTool --> Indexer
    CodeSearchTool --> Repository
Loading

🗺️ Roadmap

  • Support for additional embedding providers (Voyage AI)
  • Enhanced language support with more tree-sitter parsers

📄 License

This project is licensed under the MIT License