Skip to content

WhisperTextStreamer token_ids must be a non-empty array of integers #1273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 5 tasks
SpeedyGonzaless opened this issue Apr 5, 2025 · 3 comments
Open
1 of 5 tasks
Labels
bug Something isn't working

Comments

@SpeedyGonzaless
Copy link

System Info

@huggingface/transformers 3.4.2

Environment/Platform

  • Website/web-app
  • Browser extension
  • Server-side (e.g., Node.js, Deno, Bun)
  • Desktop app (e.g., Electron)
  • Other (e.g., VSCode extension)

Description

I am using AutomaticSpeechRecognitionPipeline (automatic-speech-recognition) and when I try to define new WhisperTextStreamer using tokenizer from this pipeline I get error:
"token_ids must be a non-empty array of integers"

This problem was not happening on versions before 3.4.0

Reproduction

Define pipeline:

const transcriber = pipeline('automatic-speech-recognition', 'onnx-community/whisper-small', {
                dtype: {
                    encoder_model:
                        this.model === "onnx-community/whisper-large-v3-turbo"
                            ? "fp16"
                            : "fp32",
                    decoder_model_merged: 'q4',
                },
                device: 'webgpu',
                progress_callback,
            });

And then try to define WhesperTextStreamer:

const streamer = new WhisperTextStreamer(transcriber.tokenizer, {
        time_precision,
        on_chunk_start: (x) => {
            const offset = (chunk_length_s - stride_length_s) * chunk_count;
            chunks.push({
                text: "",
                timestamp: [offset + x, null],
                finalised: false,
                offset,
            });
        },
        token_callback_function: () => {
            start_time = start_time || performance.now();
            if (num_tokens++ > 0) {
                tps = (num_tokens / (performance.now() - start_time)) * 1000;
            }
        },
        callback_function: (x) => {
            if (chunks.length === 0) return;
            chunks.at(-1).text += x;
            console.log('chunk', chunks.at(-1).text);
            chrome.runtime.sendMessage({
                status: 'update',
                data: {chunks, tps},
            });
        },
        on_chunk_end: (x) => {
            const current = chunks.at(-1);
            current.timestamp[1] = x + current.offset;
            current.finalised = true;
        },
        on_finalize: () => {
            start_time = null;
            num_tokens = 0;
            chunk_count++;
        },
    });
@SpeedyGonzaless SpeedyGonzaless added the bug Something isn't working label Apr 5, 2025
@xenova
Copy link
Collaborator

xenova commented Apr 22, 2025

Hi there 👋

and when I try to define new WhisperTextStreamer using tokenizer from this pipeline I get error:
"token_ids must be a non-empty array of integers"

Does this mean the error occurs at construction, or when running the pipeline the first time?

There may be an edge-case where the model stops generating, but we attempt to decode (leading to an empty input), which I can try investigate. Do you have sample input or input file that causes this error?


One possibility (unlikely) is that it may be the case that you are not awaiting the creation of the pipeline? The error message would be a bit strange in this case though 🤔

- const transcriber = pipeline('automatic-speech-recognition', 'onnx-community/whisper-small', {
+ const transcriber = await pipeline('automatic-speech-recognition', 'onnx-community/whisper-small', {
// ...
const streamer = new WhisperTextStreamer(transcriber.tokenizer, {
// ...

@SpeedyGonzaless
Copy link
Author

SpeedyGonzaless commented Apr 23, 2025

Hi there 👋

and when I try to define new WhisperTextStreamer using tokenizer from this pipeline I get error:
"token_ids must be a non-empty array of integers"

Does this mean the error occurs at construction, or when running the pipeline the first time?

There may be an edge-case where the model stops generating, but we attempt to decode (leading to an empty input), which I can try investigate. Do you have sample input or input file that causes this error?

One possibility (unlikely) is that it may be the case that you are not awaiting the creation of the pipeline? The error message would be a bit strange in this case though 🤔

  • const transcriber = pipeline('automatic-speech-recognition', 'onnx-community/whisper-small', {
  • const transcriber = await pipeline('automatic-speech-recognition', 'onnx-community/whisper-small', {
    // ...
    const streamer = new WhisperTextStreamer(transcriber.tokenizer, {
    // ...

It reproduces during running the pipeline.
You can reproduce it using your repository (https://github.com/xenova/whisper-web/tree/experimental-webgpu)
Just update "@huggingface/transformers" to version "3.5.0" and use any file. I used this one:

video_en.mp4

@PierreMesure
Copy link

PierreMesure commented May 8, 2025

Update: I just checked with 3.5.1 and the problem is still not solved.

@xenova, @fs-eire, @guschmue if you want to reproduce, you can try with my fork of Whisper-web and upgrade from 3.3.3 to any newer version. Any audio file and Whisper model will cause the problem.

Related issue: xenova/whisper-web#60

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants