Modify training script to account for stored metrics when resuming training #2286
Unanswered
CCanchilaM
asked this question in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Problem
When resuming training from a checkpoint, the metric from the previously saved best model is ignored.
CheckpointSaver
is initiated withself.best_epoch = None
andself.best_metric = None
, completely ignoring previously saved results. The provided training script (train.py
) does not provide an option to read these metrics.I'm not sure if people are aware of this issue, at least I didn't realize until I restarted a training, and the new best model had lower accuracy than previously reported.
Proposed solution
This can be easily solved with something like:
Please share if there is a better way to do this or if I missed something😅
Beta Was this translation helpful? Give feedback.
All reactions