Zhara is an advanced AI assistant that combines large language models, speech recognition, text-to-speech, and lip sync capabilities for a natural interactive experience.
- Speech-to-text (STT) using Whisper
- Text-to-speech (TTS) using Coqui TTS
- Large Language Model integration with Ollama
- Lip sync generation with Rhubarb
- Modern web interface
- Python 3.10 or higher
- At least 4GB RAM available
- Internet connection for downloading models
- Ollama installed and running (for LLM functionality)
-
Clone the repository:
git clone https://github.com/YashVinchhi/zhara.git cd zhara -
Install Python dependencies:
pip install -r requirements.txt
-
Install Rhubarb Lip Sync (for lip sync generation):
- Download from Rhubarb Lip Sync releases
- Extract and add the
rhubarbexecutable to your system PATH
-
Install and start Ollama:
- Follow the Ollama installation guide
- Pull a model (e.g.,
ollama pull qwen2.5-coder:14b) - Ensure Ollama is running on
http://localhost:11434
-
Run the application:
python zhara.py
-
Access the web interface at:
http://localhost:8000
Note: To run with Docker or Docker Compose, see DOCKER.md for detailed instructions.
You can customize Zhara's behavior using environment variables:
export MAX_AUDIO_DURATION=600
export MAX_TEXT_LENGTH=2000
export MAX_FILE_AGE=24
export OLLAMA_HOST=http://localhost:11434
python zhara.pyAvailable environment variables:
MAX_AUDIO_DURATION: Maximum audio duration in seconds (default: 300)MAX_TEXT_LENGTH: Maximum text length for TTS (default: 1000)MAX_FILE_AGE: Maximum age of files in hours (default: 24)OLLAMA_HOST: Ollama server host (default: http://localhost:11434)
-
Text Chat Endpoint
POST /ask Content-Type: application/json { "text": "your question", "model": "default" }Returns:
{ "reply": "AI response text", "audio_url": "/audio/response_xyz.wav", "viseme_url": "/viseme/viseme_xyz.json" } -
Speech-to-Text Endpoint
POST /stt Content-Type: multipart/form-data file: <audio_file>Returns:
{ "text": "transcribed text" } -
Health Check
GET /healthReturns:
{ "status": "healthy", "timestamp": "2025-08-12T14:30:00.000Z" }
-
Using curl:
# Send a text query curl -X POST "http://localhost:8000/ask" \ -H "Content-Type: application/json" \ -d '{"text": "Hello, Zhara!"}' # Send audio for transcription curl -X POST "http://localhost:8000/stt" \ -F "file=@your_audio.wav" # Check health curl "http://localhost:8000/health"
-
Using Python:
import requests # Send a text query response = requests.post( "http://localhost:8000/ask", json={"text": "Hello, Zhara!"} ) print(response.json()) # Send audio for transcription with open("your_audio.wav", "rb") as f: response = requests.post( "http://localhost:8000/stt", files={"file": f} ) print(response.json())
-
"Connection refused" when accessing Ollama:
- Check if Ollama is running:
ollama listorps aux | grep ollama - Verify Ollama is accessible at the configured host
- Verify OLLAMA_HOST environment variable
- Check if Ollama is running:
-
Audio generation fails:
- Check available disk space
- Verify storage directory permissions
- Check if TTS models are properly installed
-
Application won't start:
- Check for port conflicts (port 8000)
- Verify all Python dependencies are installed
- Check system resources and available memory
-
Rhubarb lip sync errors:
- Ensure Rhubarb is installed and in your PATH
- Check audio file format compatibility
If you encounter issues:
- Check the application logs in your terminal
- Verify all prerequisites are met
- Create an issue on GitHub with:
- Error messages
- Application logs
- System information
- Make sure Ollama is running locally if you want to use the LLM endpoint.
- The static files are served from the project directory root.
Copyright © 2025 Yash Vinchhi
All rights reserved.
This software is proprietary.
Unauthorized copying, modification, distribution, or use of this code,
via any medium, is strictly prohibited without explicit permission.