-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Exposeweights_only
for loading checkpoints with Trainer
, LightningModule
, LightningDataModule
#21072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Exposeweights_only
for loading checkpoints with Trainer
, LightningModule
, LightningDataModule
#21072
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #21072 +/- ##
=======================================
Coverage 87% 87%
=======================================
Files 269 269
Lines 23515 23519 +4
=======================================
+ Hits 20500 20504 +4
Misses 3015 3015 |
… based on ckpt version
d7cb702
to
601e300
Compare
@@ -56,11 +56,17 @@ def _load_from_checkpoint( | |||
map_location: _MAP_LOCATION_TYPE = None, | |||
hparams_file: Optional[_PATH] = None, | |||
strict: Optional[bool] = None, | |||
weights_only: Optional[bool] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we default to weights_only=None
or weights_only=True
? If we have no use for weights_only=None
, we can simplify the type hint to weights_only: bool = True
.
@@ -45,7 +46,12 @@ def test_load_legacy_checkpoints(tmp_path, pl_version: str): | |||
assert path_ckpts, f'No checkpoints found in folder "{PATH_LEGACY}"' | |||
path_ckpt = path_ckpts[-1] | |||
|
|||
model = ClassificationModel.load_from_checkpoint(path_ckpt, num_features=24) | |||
# legacy load utility added in 1.5.0 (see https://github.com/Lightning-AI/pytorch-lightning/pull/9166) | |||
if pl_version == "local": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the simplest way that I could think of ensuring we continue testing the legacy checkpoints. Another way could be to use torch.serialization.add_safe_globals
, but it seems a little more complicated (particularly since we're using the pl_legacy_patch
context manager already.
weights_only=True
by defaultweights_only=True
by default for loading weights
@Borda I wanted to get your opinion on something before moving forward. I've added My issue right now is with resuming training from a checkpoint with
I'm leaning towards option 1, but it involves changing up |
The cleanest way would probably be 1), but it brings so many new arguments for a marginal use... so personally I would go with 2) |
@Borda I will go ahead with changing all cases of This will cause a lot of errors with checkpoints from previous versions, so I'll update the docs/warning messages as well to inform users to use either the context manager or global environment variable. |
Hi @matsumotosan let's do that only if the underlying torch is >= 2.6 (since starting weights_only became True by default from that point on), otherwise we're going to break a lot of older code |
* try `deepspeed >=0.14.1,<=0.15.0` * drop from oldest * pip uninstall -y deepspeed * error::DeprecationWarning
I am not sure if it's possible to default to The big issue with context managers is that a different one has to be used each time a different checkpoint is loaded. Setting the environment variable With this in mind, I think passing If we need to force the I have also added
Maybe we could default add |
weights_only=True
by default for loading weightsweights_only
for loading checkpoints with Trainer
, LightningModule
, LightningDataModule
What does this PR do?
Fixes #20450 #20058 #20643
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:
Reviewer checklist
📚 Documentation preview 📚: https://pytorch-lightning--21072.org.readthedocs.build/en/21072/