Zhara AI Assistant

Zhara is an advanced AI assistant that combines large language models, speech recognition, text-to-speech, and lip sync capabilities for a natural interactive experience.

Features

Speech-to-text (STT) using Whisper
Text-to-speech (TTS) using Coqui TTS
Large Language Model integration with Ollama
Lip sync generation with Rhubarb
Modern web interface

Installation

Prerequisites

Python 3.10 or higher
At least 4GB RAM available
Internet connection for downloading models
Ollama installed and running (for LLM functionality)

Setup Instructions

Clone the repository:

git clone https://github.com/YashVinchhi/zhara.git
cd zhara

Install Python dependencies:
```
pip install -r requirements.txt
```
Install Rhubarb Lip Sync (for lip sync generation):
- Download from Rhubarb Lip Sync releases
- Extract and add the rhubarb executable to your system PATH
Install and start Ollama:
- Follow the Ollama installation guide
- Pull a model (e.g., ollama pull qwen2.5-coder:14b)
- Ensure Ollama is running on http://localhost:11434
Run the application:
```
python zhara.py
```
Access the web interface at:
```
http://localhost:8000
```

Note: To run with Docker or Docker Compose, see DOCKER.md for detailed instructions.

Configuration

You can customize Zhara's behavior using environment variables:

export MAX_AUDIO_DURATION=600
export MAX_TEXT_LENGTH=2000
export MAX_FILE_AGE=24
export OLLAMA_HOST=http://localhost:11434
python zhara.py

Available environment variables:

MAX_AUDIO_DURATION: Maximum audio duration in seconds (default: 300)
MAX_TEXT_LENGTH: Maximum text length for TTS (default: 1000)
MAX_FILE_AGE: Maximum age of files in hours (default: 24)
OLLAMA_HOST: Ollama server host (default: http://localhost:11434)

API Reference

Endpoints

Text Chat Endpoint

POST /ask
Content-Type: application/json

{
  "text": "your question",
  "model": "default"
}

Returns:

{
  "reply": "AI response text",
  "audio_url": "/audio/response_xyz.wav",
  "viseme_url": "/viseme/viseme_xyz.json"
}

Speech-to-Text Endpoint

POST /stt
Content-Type: multipart/form-data

file: <audio_file>

Returns:

{
  "text": "transcribed text"
}

Health Check

GET /health

Returns:

{
  "status": "healthy",
  "timestamp": "2025-08-12T14:30:00.000Z"
}

Example Usage

Using curl:

# Send a text query
curl -X POST "http://localhost:8000/ask" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, Zhara!"}'

# Send audio for transcription
curl -X POST "http://localhost:8000/stt" \
  -F "file=@your_audio.wav"

# Check health
curl "http://localhost:8000/health"

Using Python:

import requests

# Send a text query
response = requests.post(
    "http://localhost:8000/ask",
    json={"text": "Hello, Zhara!"}
)
print(response.json())

# Send audio for transcription
with open("your_audio.wav", "rb") as f:
    response = requests.post(
        "http://localhost:8000/stt",
        files={"file": f}
    )
print(response.json())

Troubleshooting

Common Issues

"Connection refused" when accessing Ollama:
- Check if Ollama is running: ollama list or ps aux | grep ollama
- Verify Ollama is accessible at the configured host
- Verify OLLAMA_HOST environment variable
Audio generation fails:
- Check available disk space
- Verify storage directory permissions
- Check if TTS models are properly installed
Application won't start:
- Check for port conflicts (port 8000)
- Verify all Python dependencies are installed
- Check system resources and available memory
Rhubarb lip sync errors:
- Ensure Rhubarb is installed and in your PATH
- Check audio file format compatibility

Getting Help

If you encounter issues:

Check the application logs in your terminal
Verify all prerequisites are met
Create an issue on GitHub with:
- Error messages
- Application logs
- System information

Notes

Make sure Ollama is running locally if you want to use the LLM endpoint.
The static files are served from the project directory root.

License

This software is proprietary.
Unauthorized copying, modification, distribution, or use of this code,
via any medium, is strictly prohibited without explicit permission.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.idea		.idea
__pycache__		__pycache__
docker		docker
static		static
storage		storage
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
DOCKER.md		DOCKER.md
LOGO.svg		LOGO.svg
README.md		README.md
api_router.py		api_router.py
app_state.py		app_state.py
chroma_memory.py		chroma_memory.py
config.py		config.py
docker-compose.yml		docker-compose.yml
local_cache.py		local_cache.py
requirements.txt		requirements.txt
response.wav		response.wav
session_manager.py		session_manager.py
test_gpu.py		test_gpu.py
test_pytorch_gpu.py		test_pytorch_gpu.py
test_tts_run.py		test_tts_run.py
tts_service.py		tts_service.py
utils.py		utils.py
visemes.json		visemes.json
zhara.log		zhara.log
zhara.py		zhara.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Zhara AI Assistant

Features

Installation

Prerequisites

Setup Instructions

Configuration

API Reference

Endpoints

Example Usage

Troubleshooting

Common Issues

Getting Help

Notes

License

About

Uh oh!

Releases

Packages

Languages

YashVinchhi/zhara-stable

Folders and files

Latest commit

History

Repository files navigation

Zhara AI Assistant

Features

Installation

Prerequisites

Setup Instructions

Configuration

API Reference

Endpoints

Example Usage

Troubleshooting

Common Issues

Getting Help

Notes

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages