A semantic code search tool for intelligent, cross-repo context retrieval.
- AST-Based Chunking: Intelligent code parsing using Abstract Syntax Trees for optimal chunk boundaries
- Embedding & Semantic Search: Using OpenAI's
text-embedding-3-small
model (support forvoyage-code-3
planned) - Vector Database: PostgreSQL with pgvector extension for efficient similarity search
- Multi-Language Support: TypeScript, JavaScript, and extensible for other languages
- Multi-Project Support: Index and search multiple projects
- MCP Integration: Seamlessly connects with AI coding assistants through Model Context Protocol
h-codex can be integrated with AI assistants through the Model Context Protocol.
Edit your claude_mcp_settings.json
file:
{
"mcpServers": {
"h-codex": {
"command": "npx",
"args": ["@hpbyte/h-codex-mcp"],
"env": {
"LLM_API_KEY": "your_llm_api_key_here",
"LLM_BASE_URL": "your_llm_base_url_here (default is openai baseurl: https://api.openai.com/v1)",
"DB_CONNECTION_STRING": "postgresql://postgres:password@localhost:5432/h-codex"
}
}
}
}
- Node.js (v18+)
- pnpm - Package manager
- Docker - For running PostgreSQL with pgvector
- OpenAI API key for embeddings
-
Clone the repository
git clone https://github.com/hpbyte/h-codex.git cd h-codex
-
Set up environment variables
cp packages/core/.env.example packages/core/.env
Edit the
.env
file with your OpenAI API key and other configuration options. -
Install dependencies
pnpm install
-
Start PostgreSQL database
cd dev && docker compose up -d
-
Set up the database
pnpm run db:migrate
-
Start development server
pnpm dev
Environment Variable | Description | Default |
---|---|---|
LLM_API_KEY |
LLM API key for embeddings | Required |
LLM_BASE_URL |
LLM Base url key for embeddings | https://api.openai.com/v1 |
EMBEDDING_MODEL |
OpenAI model for embeddings | text-embedding-3-small |
CHUNK_SIZE |
Maximum chunk size in characters | 1000 |
SEARCH_RESULTS_LIMIT |
Max search results returned | 10 |
SIMILARITY_THRESHOLD |
Minimum similarity for results | 0.5 |
DB_CONNECTION_STRING |
PostgreSQL connection string | postgresql://postgres:password@localhost:5432/h-codex |
graph TD
subgraph "Core Package"
subgraph "Ingestion Pipeline"
Explorer["Explorer<br/>(file discovery)"]
Chunker["Chunker<br/>(AST parsing & chunking)"]
Embedder["Embedder<br/>(semantic embeddings)"]
Indexer["Indexer<br/>(orchestration)"]
Explorer --> Chunker
Chunker --> Embedder
Embedder --> Indexer
end
subgraph "Storage Layer"
Repository["Repository"]
end
Indexer --> Repository
Repository --> Database[(PostgreSQL Vector Database)]
end
subgraph "MCP Package"
MCPServer["MCP Server"]
CodeIndexTool["Code Index Tool"]
CodeSearchTool["Code Search Tool"]
MCPServer --> CodeIndexTool
MCPServer --> CodeSearchTool
end
CodeIndexTool --> Indexer
CodeSearchTool --> Repository
- Support for additional embedding providers (Voyage AI)
- Enhanced language support with more tree-sitter parsers
This project is licensed under the MIT License