| Contexi lets you interact with your entire codebase as a code review co-pilot, using LLM locally. |
Contexi uses:
- Multi Prompt Contextually Guided Retrieval-Augmented Generation
- Self-Critique & Self-Corrective using Chain-of-Thoughts
- Document Re-Ranking
highly optimized techniques to provide the most relevant context-aware responses to questions about your code/data.
✅ Analyzes and understands your entire codebase and data, not just isolated code snippets.
✅ Answers questions about potential security vulnerabilities anywhere in the code.
✅ Import code using git url for analysis.
✅ Learns from follow-up questions and continuously answers based on chat history context
✅ Runs entirely on your local machine for free, No Internet is required.
- Ollama - Preferred models: qwen2.5 (for more precise results)
- Recommended 16 GB RAM and plenty of free disk space
- Python 3.7+
- Various Python dependencies (see
requirements.txt)
- Tested in Java codebase (You can configure
config.ymlto load other code/file formats)
We'd recommend installing app on python virtual environment
-
Clone this repository:
git clone https://github.com/AI-Security-Research-Group/Contexi.git cd Contexi -
Install the required Python packages:
pip install -r requirements.txt -
Edit
config.ymlparameters based on your requirements. -
Run
python3 main.py
Upon running main.py just select any of the below options:
(venv) coder@system ~/home/Contexi $
Welcome to Contexi!
Please select a mode to run:
1. Interactive session
2. UI
3. API
Enter your choice (1, 2, or 3):
You are ready to use the magic stick. 🪄
Send POST requests to http://localhost:8000/ask with your questions.
Example using curl:
curl -X POST "http://localhost:8000/ask" -H "Content-Type: application/json" -d '{"question": "What is the purpose of the Login class?"}'Response format:
{
"answer": "The Login class is responsible for handling user authentication..."
}Open an Issue if you're having problem with running or installing this script. (Script is tested in mac environment.)
You can customize various aspects of the script:
- Adjust the
chunk_sizeandchunk_overlapin thesplit_documents_into_chunksfunction to change how documents are split. - Modify the
PROMPT_TEMPLATEto alter how the LLM interprets queries and generates responses. - Change the
max_iterationsinperform_cragto adjust how many times the system will attempt to refine an answer. - Modify the
num_ctxininitialize_llmto adjust the llm context window for better results. - Adjust
n_ideasparameter to define the depth of accuracy and completeness you need in the answers.
- If you encounter memory issues, try reducing the
chunk_sizeandnum_ctxor the number of documents processed at once. - Ensure that Ollama is running and the correct model name is mentioned in
config.ymlfile.
- Codebase Analysis: Understand and explore large code repositories by asking natural language questions.
- Security Auditing: Identify potential security vulnerabilities by querying specific endpoints or functions.
- Educational Tools: Help new developers understand codebases by providing detailed answers to their questions.
- Documentation Generation: Generate explanations or documentation for code segments. AND MORE..
-
Make the important parameters configurable using yaml file✅ - Drag and drop folder in UI for analysis
- Scan source folder and suggest file extension to be analyzed
- Make config.yml configurable in UI
- Session based chat to switch context on each new session
- Persistant chat UI interface upon page refresh
- Add only most recent response in history context
-
Implement tree-of-thoughts concept -
Create web interface✅ -
Integrate the repository import feature which imports repo locally automatically to perform analysis✅
- Use Semgrep to identify potential vulnerabilities based on patterns.
- Pass the identified snippets to a data flow analysis tool to determine if the input is user-controlled.
- Provide the LLM with the code snippet, data flow information, and any relevant AST representations.
- Ask the LLM to assess the risk based on this enriched context.
- Use the LLM's output to prioritize vulnerabilities, focusing on those where user input reaches dangerous functions.
- Optionally, perform dynamic analysis or manual code review on high-risk findings to confirm exploitability.
Contributions to Contexi are welcome! Please submit pull requests or open issues on the GitHub repository.
