Residual adapters in sherpa-onnx for streaming ASR

I am working on a project that needs highly personalized models, but per speaker data is limited. In the literature, residual adapters are popular for the problem I am working on to be trained on a per speaker (or a per cohort basis, assuming a speaker is from a particular cohort having a certain kind of atypicality). Whisper models allow for adapters and with the PEFT library, adapter merging is trivial. With the 30s delay though I understand Whisper is not for streaming. NeMO models are also supported by sherpa-onnx and do come with adapter support: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/adapters/intro.html. I've played around with adapting a couple of streaming models. I wanted to know what approach might make sense to incorporate adapters:

1. Modify the model export and sherpa-onnx runtime to allow for model adapters?
2. Write a custom script with NeMO to merge adapters and then leave the sherpa-onnx runtime untouched.

Any advice would be appreciated. Is there any past experiments with adapters on sherpa-onnx or anywhere on the roadmap?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Residual adapters in sherpa-onnx for streaming ASR #2406

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Residual adapters in sherpa-onnx for streaming ASR #2406

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions