Skip to content

Residual adapters in sherpa-onnx for streaming ASR #2406

@aanchan

Description

@aanchan

I am working on a project that needs highly personalized models, but per speaker data is limited. In the literature, residual adapters are popular for the problem I am working on to be trained on a per speaker (or a per cohort basis, assuming a speaker is from a particular cohort having a certain kind of atypicality). Whisper models allow for adapters and with the PEFT library, adapter merging is trivial. With the 30s delay though I understand Whisper is not for streaming. NeMO models are also supported by sherpa-onnx and do come with adapter support: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/adapters/intro.html. I've played around with adapting a couple of streaming models. I wanted to know what approach might make sense to incorporate adapters:

  1. Modify the model export and sherpa-onnx runtime to allow for model adapters?
  2. Write a custom script with NeMO to merge adapters and then leave the sherpa-onnx runtime untouched.

Any advice would be appreciated. Is there any past experiments with adapters on sherpa-onnx or anywhere on the roadmap?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions