-
Notifications
You must be signed in to change notification settings - Fork 28
Description
Initial action plans
Copying these things from the wav2vec2 repo for safe housekeeping.
- An immediate quantize could be to convert the fine-tuned model using TFLite APIs. Post-training quantization, in specific, might be very useful. Quantization-aware training might be even more helpful but its support on TPUs is limited. I remember you had tried post-training quantization but the resulting model size was around 400 MB and I had shared some thoughts around that. Might be a good idea to again revisit post-training quantization in that case.
- Google Research recently published FRILL which could be relevant for us. Basically, they perform knowledge distillation with a smaller student model with careful design choices along with quantization-aware training.
- Meanwhile, if you have any other ideas that you think might be worth trying out please feel free to share them. If we have anything concrete and novel we can even target a publication in that case.
Suggesting another important resource here: Knowledge distillation: A good teacher is patient and consistent. The paper introduces simple recipes to get the best possible student model. But the study is based on image classification models. So, might be a fun exercise to try to think of ways in which this could be extended here.
A baseline approach to distil Wav2Vec2: Shrinking Bigfoot: Reducing wav2vec 2.0 footprint
Other useful resources
Model Optimization
Efficient Methods and Hardware for Deep Learning by Song Han
Lecture on Quantization by Pete Warden
For non-trivial model conversions in TFLite you can refer to the following repositories
https://github.com/tulasiram58827/ocr_tflite/
https://github.com/tulasiram58827/TTS_TFLite
https://github.com/sayakpaul/Adventures-in-TensorFlow-Lite