A latency optimized Django REST API for a Audio LLM Audio sandwich chatbot that does Retrieval Augmented Generation (RAG) on NCERT school textbooks. There is also a minimal html demo that demonstrates compatible audio processing for Azure Speech Services.
Server is also ready and optimized to deploy on air gapped servers thanks to local models.
- Ollama - Local models
- gpt-4
- FAISS
- Langchain
- Azure Document Intelligence
- Azure Speech Services