WhisperTextStreamer token_ids must be a non-empty array of integers #1273

SpeedyGonzaless · 2025-04-05T23:49:41Z

System Info

@huggingface/transformers 3.4.2

Environment/Platform

Description

I am using AutomaticSpeechRecognitionPipeline (automatic-speech-recognition) and when I try to define new WhisperTextStreamer using tokenizer from this pipeline I get error:
"token_ids must be a non-empty array of integers"

This problem was not happening on versions before 3.4.0

Reproduction

Define pipeline:

const transcriber = pipeline('automatic-speech-recognition', 'onnx-community/whisper-small', {
                dtype: {
                    encoder_model:
                        this.model === "onnx-community/whisper-large-v3-turbo"
                            ? "fp16"
                            : "fp32",
                    decoder_model_merged: 'q4',
                },
                device: 'webgpu',
                progress_callback,
            });

And then try to define WhesperTextStreamer:

const streamer = new WhisperTextStreamer(transcriber.tokenizer, {
        time_precision,
        on_chunk_start: (x) => {
            const offset = (chunk_length_s - stride_length_s) * chunk_count;
            chunks.push({
                text: "",
                timestamp: [offset + x, null],
                finalised: false,
                offset,
            });
        },
        token_callback_function: () => {
            start_time = start_time || performance.now();
            if (num_tokens++ > 0) {
                tps = (num_tokens / (performance.now() - start_time)) * 1000;
            }
        },
        callback_function: (x) => {
            if (chunks.length === 0) return;
            chunks.at(-1).text += x;
            console.log('chunk', chunks.at(-1).text);
            chrome.runtime.sendMessage({
                status: 'update',
                data: {chunks, tps},
            });
        },
        on_chunk_end: (x) => {
            const current = chunks.at(-1);
            current.timestamp[1] = x + current.offset;
            current.finalised = true;
        },
        on_finalize: () => {
            start_time = null;
            num_tokens = 0;
            chunk_count++;
        },
    });

The text was updated successfully, but these errors were encountered:

xenova · 2025-04-22T14:56:22Z

Hi there 👋

and when I try to define new WhisperTextStreamer using tokenizer from this pipeline I get error:
"token_ids must be a non-empty array of integers"

Does this mean the error occurs at construction, or when running the pipeline the first time?

There may be an edge-case where the model stops generating, but we attempt to decode (leading to an empty input), which I can try investigate. Do you have sample input or input file that causes this error?

One possibility (unlikely) is that it may be the case that you are not awaiting the creation of the pipeline? The error message would be a bit strange in this case though 🤔

- const transcriber = pipeline('automatic-speech-recognition', 'onnx-community/whisper-small', {
+ const transcriber = await pipeline('automatic-speech-recognition', 'onnx-community/whisper-small', {
// ...
const streamer = new WhisperTextStreamer(transcriber.tokenizer, {
// ...

SpeedyGonzaless · 2025-04-23T18:09:29Z

Hi there 👋

and when I try to define new WhisperTextStreamer using tokenizer from this pipeline I get error:
"token_ids must be a non-empty array of integers"

Does this mean the error occurs at construction, or when running the pipeline the first time?

There may be an edge-case where the model stops generating, but we attempt to decode (leading to an empty input), which I can try investigate. Do you have sample input or input file that causes this error?

One possibility (unlikely) is that it may be the case that you are not awaiting the creation of the pipeline? The error message would be a bit strange in this case though 🤔

const transcriber = pipeline('automatic-speech-recognition', 'onnx-community/whisper-small', {

const transcriber = await pipeline('automatic-speech-recognition', 'onnx-community/whisper-small', {
// ...
const streamer = new WhisperTextStreamer(transcriber.tokenizer, {
// ...

It reproduces during running the pipeline.
You can reproduce it using your repository (https://github.com/xenova/whisper-web/tree/experimental-webgpu)
Just update "@huggingface/transformers" to version "3.5.0" and use any file. I used this one:

video_en.mp4

PierreMesure · 2025-05-08T12:50:51Z

Update: I just checked with 3.5.1 and the problem is still not solved.

@xenova, @fs-eire, @guschmue if you want to reproduce, you can try with my fork of Whisper-web and upgrade from 3.3.3 to any newer version. Any audio file and Whisper model will cause the problem.

Related issue: xenova/whisper-web#60

SpeedyGonzaless added the bug Something isn't working label Apr 5, 2025

PierreMesure mentioned this issue May 12, 2025

🐛 v3 crashes on iOS and macOS devices due to increasing memory usage #1242

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WhisperTextStreamer token_ids must be a non-empty array of integers #1273

WhisperTextStreamer token_ids must be a non-empty array of integers #1273

SpeedyGonzaless commented Apr 5, 2025

xenova commented Apr 22, 2025 •

edited

Loading

SpeedyGonzaless commented Apr 23, 2025 •

edited by xenova

Loading

PierreMesure commented May 8, 2025 •

edited

Loading

WhisperTextStreamer token_ids must be a non-empty array of integers #1273

WhisperTextStreamer token_ids must be a non-empty array of integers #1273

Comments

SpeedyGonzaless commented Apr 5, 2025

System Info

Environment/Platform

Description

Reproduction

xenova commented Apr 22, 2025 • edited Loading

SpeedyGonzaless commented Apr 23, 2025 • edited by xenova Loading

PierreMesure commented May 8, 2025 • edited Loading

xenova commented Apr 22, 2025 •

edited

Loading

SpeedyGonzaless commented Apr 23, 2025 •

edited by xenova

Loading

PierreMesure commented May 8, 2025 •

edited

Loading