-
Notifications
You must be signed in to change notification settings - Fork 59
Description
Hi,
I was wondering what exactly the line optimizer.param_groups[0]['params'] = model.parameters() in init_prompts() does, see here. It seems to try to adapt the optimizer to the changed network. Given the very different shapes and contents of both the lists though, it is unclear to me which tensors are even used.
Because of "reinit_optimizer": true in all given configs, the setting of the optimizer parameters gets directly overwritten and doesn't do anything. If that is not the case though, the problem has strong effects.
To show this, I ran two experiments with and w/o reinit_optimizer on cifar. It resulted in an avg accuracy drop from
CNN top1 curve: [96.6, 93.2, 90.87, 89.28, 86.72, 85.53, 85.04, 82.75, 81.08, 81.12]
CNN top5 curve: [99.8, 99.1, 98.7, 98.22, 98.1, 97.77, 97.59, 96.94, 96.27, 96.17]
versus
CNN top1 curve: [96.6, 49.15, 32.77, 24.38, 19.5, 16.2, 13.77, 12.14, 10.77, 9.72]
CNN top5 curve: [99.8, 63.25, 42.07, 30.82, 25.84, 21.23, 18.71, 16.59, 14.52, 12.88]
So, how can the optimizer better be adapted with the prompt initialization?