-
Notifications
You must be signed in to change notification settings - Fork 8
Implement ModelMetadata
storing in checkpoints
#592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces functionality to attach and load ModelMetadata objects within model checkpoints and ensures that the export and load procedures for checkpoints incorporate metadata seamlessly.
- Metadata is now stored directly under the "metadata" key in checkpoints.
- The naming convention for default metadata has been updated (from "default_metadata" to "metadata") across different models and trainers.
- The command‐line export interface has been updated to support metadata attachment for checkpoint files.
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
tests/cli/test_export_model.py | Updated tests to use checkpoint-based model files instead of direct model instantiation. |
src/metatrain/utils/metadata.py | Revised metadata merging to avoid duplicating references. |
src/metatrain/soap_bpnn/trainer.py, pet/trainer.py, etc. | Updated checkpoint saving to include metadata from model.metadata. |
src/metatrain/soap_bpnn/model.py, pet/model.py, gap/model.py, nanopet/model.py | Renamed default_metadata to metadata and updated metadata handling in load/export routines. |
src/metatrain/cli/export.py | Modified argument handling and export_model logic to support metadata on both checkpoint (.ckpt) and .pt exports. |
Comments suppressed due to low confidence (3)
src/metatrain/soap_bpnn/model.py:173
- [nitpick] The renaming from default_metadata to metadata improves clarity; ensure that the change is consistently reflected in documentation and all related modules.
__metadata__ = ModelMetadata(
tests/cli/test_export_model.py:18
- The removal of the unused model import cleans up the test file; ensure all tests only use the checkpoint-based export approach as intended.
-from metatrain.soap_bpnn import __model__
src/metatrain/cli/export.py:144
- [nitpick] Verify that the merge order in the metadata update is intended; currently, the new metadata is being merged into the checkpoint metadata and then assigned back, which may be confusing. Consider adding a comment or refactoring to make the merge precedence explicit.
append_metadata_references(metadata, current_metadata)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good overall, but is still missing a test that create a checkpoint with metadata, and then re-export to .pt
, checking that the metadata is correctly defined
@@ -17,6 +17,8 @@ def append_metadata_references(self: ModelMetadata, other: ModelMetadata) -> Non | |||
if key not in self_dict["references"]: | |||
self_dict["references"][key] = values | |||
else: | |||
self_dict["references"][key] += values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we turn this function into a more general merge_metadata(existing, new)
? So more than just the references can be merged when fine-tuning
output: Union[Path, str], | ||
extensions: Union[Path, str] = "extensions/", | ||
token: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should mention that it is specific to HF.
token: Optional[str] = None, | |
hf_token: Optional[str] = None, |
As the title suggests, this PR adds the functionality to store the
ModelMetadata
objects inside the model checkpoints, and allows to runmtt export model.ckpt -o model-with-metadata.ckpt -m metadata.yaml
as a valid command to attach the metadata to the existing.ckpt
file.mtt export
still works properly for exporting in.pt
fileModel.load_checkpoint()
properly stores the metadata that was read from fileTrainer.save_checkpoint()
saves themodel.__metadata__
Contributor (creator of pull-request) checklist
Reviewer checklist
📚 Documentation preview 📚: https://metatrain--592.org.readthedocs.build/en/592/