This repository contains code for StatGPT backend, which implements APIs and main logic of the StatGPT application.
More information about StatGPT and its architecture can be found in the documentation repository.
Application is written in Python 3.11 and uses the following main technologies:
| Technology | Purpose |
|---|---|
| AI DIAL SDK | SDK for building applications on top of AI DIAL platform |
| FastAPI | Web framework for API development |
| SQLAlchemy | ORM for database operations |
| LangChain | LLM application framework |
| Pydantic | Data validation and settings |
| sdmx1 | SDMX data handling and provider connections |
src/admin_portal— backend of the administrator portal which allows the user to add and update data.src/common— common code used in theadmin_portalandstatgptapplications.src/statgpt— main application that generates response using LLMs and based on data prepared byadmin_portal.tests- unit and integration tests.docker- Dockerfiles for building docker images.
The applications are configured using environment variables. The environment variables are described in the following files:
- Common environment variables - used in both applications
- Admin Backend environment variables
- Chat Backend environment variables
1. Install Make
- MacOS - should be already installed
- Windows
- Windows, using Chocolatey
- Make sure that
makeis in the PATH (runwhich make).
Direct installation:
- MacOS, using Homebrew -
brew install python@3.11 - Windows or MacOS, using official repository
- Windows, using Chocolatey
- Make sure that
python3orpython3.11is in the PATH and works properly (runpython3.11 --version).
Alternative: use pyenv:
pyenvallows to manage different python versions on the same machine- execute following from the repository root folder:
pyenv install 3.11 pyenv local 3.11 # use Python 3.11 for the current project
3. Install Poetry
Recommended way - system-wide, independent of any particular python venv:
- MacOS - recommended way to install poetry is to use pipx
- Windows - recommended way to install poetry is to use official installer
- Make sure that
poetryis in the PATH and works properly (runpoetry --version).
Alternative - venv-specific (using pip):
- make sure the correct python venv is activated
make install_poetry
Since Docker Desktop requires a paid license for commercial use, you can use one of the following alternatives:
- Docker Engine and Docker Compose on Linux
- Rancher Desktop on Windows or MacOS
Required for localization commands (make extract_messages, make update_messages, make compile_messages):
- MacOS -
brew install gettext - Linux/WSL -
sudo apt install gettext - Windows (native) - Install via Chocolatey:
choco install gettext
Verify installation: which xgettext msgmerge msgfmt
Create python virtual environment, using poetry:
make init_venvIf you see the following error: Skipping virtualenv creation, as specified in config file., it means venv was not
created because poetry is configured not to create a new virtual environment. You can fix this:
- Either by updating poetry config:
poetry config --local virtualenvs.create true(local config)- or
poetry config virtualenvs.create true(global config)
- or by creating venv manually:
python -m venv .venv
For Mac / Linux:
source .venv/bin/activateFor Windows:
.venv/Scripts/ActivateThe following will install basic and dev dependencies:
make install_devYou can copy the template file and fill values for secrets manually:
cp .env.template .envThe Environment variables section provides links to pages with detailed information about environment variables.
make generate_dial_config-
Run the DIAL using docker compose:
docker compose up -d
-
Apply
alembicmigrations:-
locally:
make db_migrate
-
or using Docker:
- Set
ADMIN_MODE=ALEMBIC_UPGRADEin the.envfile - Run
admin_portalfromdocker-compose.yml
- Set
-
-
Run Admin backend (if you want to initialize or update data):
make run_admin
make formatmake lintTo automatically apply black and isort on each commit, enable PreCommit Hooks:
make install_pre_commit_hooksThis command will set up the git hook scripts.
(!) It is critical to note that autogenerate is not intended to be perfect. It is always necessary to manually review and correct the candidate migrations that autogenerate produces.
(!) After creating a new migration, it is necessary to update the ALEMBIC_TARGET_VERSION in the
src/common/config/version.py file to the new version.
make db_autogenerate MESSAGE="Your message"or:
alembic -c src/alembic.ini revision --autogenerate -m "Your message"make db_downgradeThe project uses GNU gettext for internationalizing dataset formatters. Use these commands when working with translations:
Workflow:
-
Extract translatable strings - Run after adding/modifying strings marked with
_()in formatter code:make extract_messages
This creates/updates the
locales/dataset.pottemplate file. -
Update translation files - Run to sync
.pofiles with the new template:make update_messages
This updates
en/LC_MESSAGES/dataset.poanduk/LC_MESSAGES/dataset.powith new strings. -
Compile translations - Run after translating strings in
.pofiles to generate binary.mofiles:make compile_messages
Or use the shorthand:
make locales
Note: All commands require GNU gettext to be installed (see Prerequisites).
- Integration tests require running a test database and elasticsearch.
They are part of the
docker-compose.ymlfile. The Docker containers with this database/elasticsearch don't have volumes to store data, so they are always fresh afterdocker compose down. - To run integration tests, uncomment the
vectordb-testandelasticsearch-testcontainers in thedocker-compose.ymlfile. You might also need to comment out theelasticsearchcontainer if your machine doesn't have enough resources. - To run end-to-end tests, first run StatGPT locally. This step is not required for other tests.
- Run tests:
-
all tests except for end-to-end (unit and integration):
make test -
only unit tests:
make test_unit
-
only integration tests:
make test_integration
-
just end-to-end tests:
make test_e2e
-