This project provides Speech-to-Text (STT) functionality for Uzbek language audio files. It utilizes the islomov/navaistt_v1_medium
model from Hugging Face.
The core STT model (islomov/navaistt_v1_medium
) is optimized for processing audio segments up to 30 seconds long. This project extends its capability to transcribe longer audio files by:
- Splitting the input audio into 30-second chunks.
- Processing each chunk individually using the STT model.
- Combining the transcribed text from all chunks to produce the final transcription.
The STT functionality is powered by the islomov/navaistt_v1_medium model available on Hugging Face.
- Transcription of Uzbek language audio.
- Handles audio files longer than 30 seconds through automatic chunking and result aggregation.
- Python 3.x
- Libraries specified in
requirements.txt
(if available). Key libraries likely include:transformers
torch
torchaudio
- Clone the repository (if applicable) or download the
main.py
file. - Install dependencies:
pip install -r requirements.txt # Or install libraries manually, e.g., pip install transformers torch pydub
- Code Usage
if __name__ == "__main__": starting_time = time.time() audio_file = "audio.wav" transcriber = NavaiSTT() transcription = transcriber.transcribe(audio_file) print(f"Transcription: {transcription}") print(f"Time taken: {time.time() - starting_time:.2f} seconds")
- Run the script:
python main.py