-
Notifications
You must be signed in to change notification settings - Fork 321
Open
Description
Currently, supported voices are limited to only OpenAI voices, without any possibilities to use this crate with other OpenAI compatible APIs providers that might have other voices.
See:
async-openai/async-openai/src/types/audio.rs
Lines 36 to 51 in 7964f86
#[derive(Debug, Default, Serialize, Deserialize, Clone, PartialEq)] | |
#[serde(rename_all = "lowercase")] | |
#[non_exhaustive] | |
pub enum Voice { | |
#[default] | |
Alloy, | |
Ash, | |
Ballad, | |
Coral, | |
Echo, | |
Fable, | |
Onyx, | |
Nova, | |
Sage, | |
Shimmer, | |
} |
I would like to propose a change to have support for other voices, similarly how it was done for other models
by using Other
enum option.
See:
async-openai/async-openai/src/types/audio.rs
Lines 53 to 62 in 7964f86
#[derive(Debug, Default, Serialize, Deserialize, Clone, PartialEq)] | |
pub enum SpeechModel { | |
#[default] | |
#[serde(rename = "tts-1")] | |
Tts1, | |
#[serde(rename = "tts-1-hd")] | |
Tts1Hd, | |
#[serde(untagged)] | |
Other(String), | |
} |
Here is minimal snippet of code I used.
use async_openai::{
Client,
config::OpenAIConfig,
types::{CreateSpeechRequestArgs, SpeechModel, Voice},
};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let base_url = std::env::var("BASE_URL").unwrap_or("http://localhost:8001/v1/".into());
let api_key = std::env::var("OPENAI_API_KEY").unwrap_or("sk-NO_NEED_FOR_REAL_KEY".into());
let text = "Hello! Test Test Test!";
let client = Client::with_config(
OpenAIConfig::new()
.with_api_key(api_key)
.with_api_base(base_url),
);
let request = CreateSpeechRequestArgs::default()
.input(text)
.voice(Voice::Ash) // No way to set custom voice.
.model(SpeechModel::Other(
"speaches-ai/Kokoro-82M-v1.0-ONNX".to_string(),
))
.build()?;
let response = client.audio().speech(request).await?;
response.save("./data/audio.mp3").await?;
Ok(())
}
As a custom OpenAI compatible provider I have used latest (0.8.3) docker container from https://speaches.ai/ (kind of Ollama, but for TTS/STT)
Metadata
Metadata
Assignees
Labels
No labels