This repository provides a system for retrieving data from a database using queries written in natural language.
It offers an open-source environment for experimenting with LLM-powered database search. It includes a server setup for running different models, sample code for integrating them into applications, and a Postgres service as the database.
For demonstration purposes, the dataset was used.
Key features of the application include:
- Querying the database using natural language input.
- Generating visualizations by the LLM from predefined plot types.
- Automatic fallback logic to select relevant plots when the LLM fails.
- LLM-generated tooltips explaining different parts of the SQL query.
- Natural language description of the database generated by the LLM.
LLM Host:
-
Lightweight web framework to handle HTTP requests and route interactions with the LLM.
-
WSGI HTTP server used to serve the Flask app in a production environment.
Prompt Execution:
-
Python bindings for running LLaMA models locally for prompt execution.
Plot Rendering:
-
Interactive plotting library for rendering visualizations in the browser.
-
Data manipulation library used for preparing datasets before visualization.
-
Provides advanced math operations to support plot-related calculations.
-
Used for generating treemap visualizations based on hierarchical data.
-
Color palettes used to enhance plot aesthetics and clarity.
Database Communication:
-
PostgreSQL adapter for Python used to send and retrieve data via SQL queries.
See:
requirements.txtfor needed packages for the LLM backend.docker-compose.ymlfor services.
-
Clone the repository
git clone https://github.com/your-username/nl2sql-converter.git cd nl2sql-converter -
Download the LLM model:
wget -O app/backend/models/ggml-model-Q4_K.gguf https://huggingface.co/NousResearch/Nous-Capybara-7B-V1-GGUF/resolve/e6263e5fabbdcd2d682364c66ecf54b65f25aa39/ggml-model-Q4_K.gguf?download=trueOr use any other compatible model like
DeepSeek-V3,Mistral, orNous Capybara. -
Configure environment variables
Copy the example file and edit it with your credentials and model settings:cp .env.example .env # Open .env in your editor and adjust DB_USER, DB_PASSWORD, LLM_MODEL_NAME, etc. -
Start services with Docker Compose
docker-compose up --build -d
- PostgreSQL will initialize the
pokemontable and load the CSV dataset. - The Flask‐Gunicorn backend will be built and started on
0.0.0.0:${FLASK_RUN_PORT}.
- PostgreSQL will initialize the
-
Verify everything is running
# List running containers docker ps # You should see: # postgres_nl2sql postgres:15 Up ... 5432/tcp # backend_nl2sql your-backend Up ... 5000/tcp
-
Access the application
Open your browser and navigate to:http://localhost:${FLASK_RUN_PORT}
All settings are loaded from the .env file. Keep in mind to provide the FLASK_SECRET_KEY when planning to deploy in production.
To test, activate the environment and run tests:
pipenv shell
python -m unittest discover- LLMs demand significant compute resources, ensure your hardware can handle intense inference workloads.
- Treat this tool as an exploratory aid: models may hallucinate or lack up‑to‑date context.
- Do not grant LLM agents permissions beyond read-only database access.
This project is a starting point—many improvements can be implemented:
- Testing: Add end-to-end and unit tests for the native JavaScript frontend.
- Frontend Frameworks: Explore integrating React, Vue, or similar for better maintainability.
- Backend Optimization: Profile and reduce CPU/memory usage during LLM inference and database queries.
- Production Hardening: Improve security, error handling, and user experience for real-world deployments.
- Streamlined Setup: Automate and simplify installation, configuration, and deployment.
- Special thanks to Larry Greski for curating and sharing the Pokémon dataset, which was used for demonstration and development purposes in this project.
This project is licensed under the MIT License. See the LICENSE file for more details.


