Skip to content

Various ddt improvements

Compare
Choose a tag to compare
@williamFalcon williamFalcon released this 16 Sep 14:54
· 9191 commits to master since this release

This release does the following:

  • Moves SLURM resubmit from test-tube to PL (which removes the need for cluster parameter).
  • Cluster checkpoint done by Lightning now (not test-tube). Also doesn't require a checkpoint object to restore weights when on cluster.
  • Loads all models on CPU when restoring weights to avoid OOM issues in PyTorch. User now needs to move to GPU manually. However, if using Lightning, lightning will move to correct GPUs automatically.
  • Fixes various subtle bugs in DDP implementation.
  • documentation updates