-
Notifications
You must be signed in to change notification settings - Fork 860
Description
I am working on a project that needs highly personalized models, but per speaker data is limited. In the literature, residual adapters are popular for the problem I am working on to be trained on a per speaker (or a per cohort basis, assuming a speaker is from a particular cohort having a certain kind of atypicality). Whisper models allow for adapters and with the PEFT library, adapter merging is trivial. With the 30s delay though I understand Whisper is not for streaming. NeMO models are also supported by sherpa-onnx and do come with adapter support: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/adapters/intro.html. I've played around with adapting a couple of streaming models. I wanted to know what approach might make sense to incorporate adapters:
- Modify the model export and sherpa-onnx runtime to allow for model adapters?
- Write a custom script with NeMO to merge adapters and then leave the sherpa-onnx runtime untouched.
Any advice would be appreciated. Is there any past experiments with adapters on sherpa-onnx or anywhere on the roadmap?