Skip to content

Implement ModelMetadata storing in checkpoints #592

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

abmazitov
Copy link
Contributor

@abmazitov abmazitov commented May 9, 2025

As the title suggests, this PR adds the functionality to store the ModelMetadata objects inside the model checkpoints, and allows to run mtt export model.ckpt -o model-with-metadata.ckpt -m metadata.yaml as a valid command to attach the metadata to the existing .ckpt file.

  • Metadata attachment to the checkpoint works
  • mtt export still works properly for exporting in .pt file
  • Model.load_checkpoint() properly stores the metadata that was read from file
  • Trainer.save_checkpoint() saves the model.__metadata__

Contributor (creator of pull-request) checklist

  • Tests updated (for new features and bugfixes)?
  • Documentation updated (for new features)?

Reviewer checklist

  • CHANGELOG updated with public API or any other important changes?

📚 Documentation preview 📚: https://metatrain--592.org.readthedocs.build/en/592/

@Copilot Copilot AI review requested due to automatic review settings May 9, 2025 12:58
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces functionality to attach and load ModelMetadata objects within model checkpoints and ensures that the export and load procedures for checkpoints incorporate metadata seamlessly.

  • Metadata is now stored directly under the "metadata" key in checkpoints.
  • The naming convention for default metadata has been updated (from "default_metadata" to "metadata") across different models and trainers.
  • The command‐line export interface has been updated to support metadata attachment for checkpoint files.

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/cli/test_export_model.py Updated tests to use checkpoint-based model files instead of direct model instantiation.
src/metatrain/utils/metadata.py Revised metadata merging to avoid duplicating references.
src/metatrain/soap_bpnn/trainer.py, pet/trainer.py, etc. Updated checkpoint saving to include metadata from model.metadata.
src/metatrain/soap_bpnn/model.py, pet/model.py, gap/model.py, nanopet/model.py Renamed default_metadata to metadata and updated metadata handling in load/export routines.
src/metatrain/cli/export.py Modified argument handling and export_model logic to support metadata on both checkpoint (.ckpt) and .pt exports.
Comments suppressed due to low confidence (3)

src/metatrain/soap_bpnn/model.py:173

  • [nitpick] The renaming from default_metadata to metadata improves clarity; ensure that the change is consistently reflected in documentation and all related modules.
__metadata__ = ModelMetadata(

tests/cli/test_export_model.py:18

  • The removal of the unused model import cleans up the test file; ensure all tests only use the checkpoint-based export approach as intended.
-from metatrain.soap_bpnn import __model__

src/metatrain/cli/export.py:144

  • [nitpick] Verify that the merge order in the metadata update is intended; currently, the new metadata is being merged into the checkpoint metadata and then assigned back, which may be confusing. Consider adding a comment or refactoring to make the merge precedence explicit.
append_metadata_references(metadata, current_metadata)

@abmazitov abmazitov requested a review from Luthaf May 9, 2025 13:01
Copy link
Member

@Luthaf Luthaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good overall, but is still missing a test that create a checkpoint with metadata, and then re-export to .pt, checking that the metadata is correctly defined

@@ -17,6 +17,8 @@ def append_metadata_references(self: ModelMetadata, other: ModelMetadata) -> Non
if key not in self_dict["references"]:
self_dict["references"][key] = values
else:
self_dict["references"][key] += values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we turn this function into a more general merge_metadata(existing, new)? So more than just the references can be merged when fine-tuning

output: Union[Path, str],
extensions: Union[Path, str] = "extensions/",
token: Optional[str] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should mention that it is specific to HF.

Suggested change
token: Optional[str] = None,
hf_token: Optional[str] = None,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants