Skip to content

Validation takes place every N time  #13324

@paantya

Description

@paantya

🚀 Feature

validation takes place every N time (ex. validation takes place every 15 minutes )

Motivation

I would like to be able to run validation once at some fixed time.
I know that validation takes a specific time, and in order not to spend more training time on validation, I want to be able to run training once in a specific period of time.

For example, my validation takes 5 minutes (on the CPU, because this is RL). then I want to run it once every 15, so that the time spent on validation is no more than 25% of the total training time.
This is due to the fact that on different machines and servers, learning has a different speed, and if we start from the number of iterations, then it may be that we either have too frequent validation and most of the time is spent on it, or too rare, and we could do it more often if we were guided by the elapsed time from the previous validation.

Yes, I understand that this is not a universal solution, but it is better than counting the number of iterations, as we do now)

When changing the network architecture or even selecting, the time changes iterations begins to change in a more complicated way.

the training itself takes place on the GPU

Pitch

so that you can set a time interval, as is done with max_time , once in which validation will be called. Alternatively, this can be tied to the end of the epoch, roughly speaking, if the timer has come, then as soon as the epoch is over, then we go to do the validation. Or even make an additional flag that would say to do validation when time expires or when time expires and the end of the epoch.

Alternatives

described above

Additional context

slack link https://pytorch-lightning.slack.com/archives/CRBLFHY79/p1655476166180589


Thank you very much for the lightning, we love and use it very much ❤️

cc @Borda @justusschock @kaushikb11 @awaelchli @rohitgr7 @ninginthecloud

Metadata

Metadata

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions