Skip to content

Conversation

lucataco
Copy link
Contributor

@lucataco lucataco commented Aug 7, 2025

Hello! This PR adds support for the Automatic Speech Recognition task type for Replicate models.

Example:

cc @hanouticelina

Copy link
Contributor

@zeke zeke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. 👍🏼

Copy link
Contributor

@hanouticelina hanouticelina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @lucataco, thanks a lot for the contribution! could you also add the automatic-speech-recognition mapping for Replicate in

export const PROVIDERS: Record<InferenceProvider, Partial<Record<InferenceTask, TaskProviderHelper>>> = {

you can find the complete guideline for provider/task JS integration in the documentation here: https://huggingface.co/docs/inference-providers/register-as-a-provider#2-js-client-integration

@lucataco
Copy link
Contributor Author

Thank you for taking a look! Ive added the mapping as specified

Comment on lines +206 to +212
const out = response?.output as
| undefined
| {
transcription?: string;
translation?: string;
txt_file?: string;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@hanouticelina hanouticelina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lucataco for the PR! I pushed a commit to fix the response parsing part.
Also, i think the version is missing in the providerId defined in the Replicate model mapping: https://huggingface.co/api/partners/replicate/models. it should be
"openai/whisper:8099696689d249cf8b122d833c36ac3f75505c666a395ca40ef26f68e7d3d16e". could you update it accordingly? thanks 🙏

@lucataco
Copy link
Contributor Author

Oh good catch, thank you! Yes of course.
Ive updated the mapping with the specified whisper version here

@zeke
Copy link
Contributor

zeke commented Aug 21, 2025

Gentle bump. Anything blocking getting this shipped?

@coyotte508
Copy link
Member

merging @SBrandeis @hanouticelina

@coyotte508 coyotte508 requested a review from Copilot August 25, 2025 10:34
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds Automatic Speech Recognition (ASR) support for the Replicate provider in the inference package. It enables users to perform speech-to-text transcription using Replicate models like OpenAI's Whisper.

  • Implements ReplicateAutomaticSpeechRecognitionTask class to handle ASR requests for Replicate provider
  • Removes existing output validation from the generic ASR function to allow provider-specific handling
  • Registers the new ASR task in the provider configuration

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
packages/inference/src/tasks/audio/automaticSpeechRecognition.ts Removes generic output validation to allow provider-specific response handling
packages/inference/src/providers/replicate.ts Implements new ASR task class with audio input processing and response parsing
packages/inference/src/lib/getProviderHelper.ts Registers the new ASR task for the Replicate provider

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +187 to +188
if (!blob || !(blob instanceof Blob)) {
throw new Error("Audio input must be a Blob");
Copy link
Preview

Copilot AI Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message 'Audio input must be a Blob' is not descriptive enough. Consider providing more context about expected input formats and how to convert them to Blob.

Suggested change
if (!blob || !(blob instanceof Blob)) {
throw new Error("Audio input must be a Blob");
throw new Error(
"Audio input must be a Blob (e.g., a File or Blob object from the browser). " +
"Received: " + (blob === undefined ? "undefined" : typeof blob) + ". " +
"To convert an ArrayBuffer or base64 string to a Blob, use: " +
"`new Blob([arrayBuffer], { type: 'audio/wav' })` or " +
"`fetch('data:audio/wav;base64,...').then(res => res.blob())`. " +
"See documentation for supported input formats."
);

Copilot uses AI. Check for mistakes.

@coyotte508 coyotte508 merged commit 166cd60 into huggingface:main Aug 25, 2025
4 checks passed
AlpineVibrations pushed a commit to aifx-art/huggingface.js that referenced this pull request Aug 25, 2025
Hello! This PR adds support for the `Automatic Speech Recognition` task
type for Replicate models.

Example:

-
[huggingface.co/openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)
- [replicate.com/openai/whisper](https://replicate.com/openai/whisper)

cc @hanouticelina

---------

Co-authored-by: Celina Hanouti <hanouticelina@gmail.com>
Co-authored-by: Eliott C. <coyotte508@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants