Skip to content

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Aug 27, 2025

This PR implements automatic git repository detection and metadata attachment during model uploads to provide complete traceability of model versions to their source code state.

What's Changed

When using clarifai model upload, the ModelBuilder now automatically:

  1. Detects git repository information - Checks if the model path is within a git repository and extracts:

    • Repository URL (from remote.origin.url)
    • Current commit hash
    • Current branch name
  2. Warns about uncommitted changes in model directory only - Detects uncommitted changes using git status --porcelain . limited to the model path and prompts the user:

    Uncommitted changes detected in model path:
    ?? my_model/uncommitted_file.txt
    
    Do you want to continue upload with uncommitted changes? (y/N):
    
  3. Attaches git metadata to model version - Includes git information in the model version metadata under the git_registry key for full traceability.

Key Features

  • Model-scoped git checking: Only checks for uncommitted changes within the model directory, ignoring changes elsewhere in the repository
  • Graceful error handling: Works seamlessly in non-git directories and handles git command failures
  • Windows compatibility: Test cleanup handles Windows file permission issues with git directories
  • Minimal impact: Only ~100 lines of code added with comprehensive test coverage

Example Usage

# Before upload, git info is automatically detected and logged
logger.info("Detected git repository: https://github.com/user/ml-model.git")
logger.info("Current commit: abc123def456789...")
logger.info("Current branch: main")

# Only uncommitted changes in model path trigger prompt
# Git metadata is attached to model version proto automatically

Implementation Details

  • Added 3 new methods to ModelBuilder class for git integration
  • Uses subprocess to safely interact with git commands with proper error handling
  • Git status checking limited to model directory using git status --porcelain . with cwd=model_folder
  • Windows-compatible test cleanup with progressive fallback strategy
  • All existing functionality preserved - git features are purely additive

Testing

Added comprehensive unit tests covering:

  • Git and non-git environments
  • Clean and dirty repository states within model scope
  • Files outside model directory being correctly ignored
  • User acceptance/rejection workflows
  • Metadata attachment and retrieval
  • Windows-compatible cleanup for git directories
  • All existing tests continue to pass

Fixes #761.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: zeiler <2138258+zeiler@users.noreply.github.com>
@Copilot Copilot AI changed the title [WIP] Add git registry information during model upload Add git registry information during model upload Aug 27, 2025
@Copilot Copilot AI requested a review from zeiler August 27, 2025 05:22
Copilot finished work on behalf of zeiler August 27, 2025 05:22
@zeiler
Copy link
Member

zeiler commented Aug 27, 2025

@copilot i noticed that it checks for any uncommited files in the entire repo, not just within model_path. Can you update to check within the model_path only please.

Co-authored-by: zeiler <2138258+zeiler@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Aug 27, 2025

@copilot i noticed that it checks for any uncommited files in the entire repo, not just within model_path. Can you update to check within the model_path only please.

Fixed! The git status checking is now limited to the model path only. Modified _check_git_status_and_prompt() to use git status --porcelain . with cwd=self.folder, which only checks for uncommitted changes within the model directory rather than the entire repository. Added a comprehensive test case to verify this behavior works correctly.

Commit: 32c9622

@Copilot Copilot AI changed the title Add git registry information during model upload Add git registry information during model upload with model-scoped change detection Aug 27, 2025
Copilot finished work on behalf of zeiler August 27, 2025 16:22
@zeiler zeiler marked this pull request as ready for review August 27, 2025 17:17
@zeiler zeiler requested review from Copilot and ackizilkale August 27, 2025 17:17
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements automatic git repository detection and metadata attachment during model uploads to provide complete traceability of model versions to their source code state. The implementation adds git integration features that automatically detect repository information, warn about uncommitted changes in the model directory only, and attach git metadata to model versions.

Key changes include:

  • Added three new git-related methods to ModelBuilder class for repository detection and status checking
  • Implemented model-scoped change detection that only checks for uncommitted files within the model directory
  • Added git metadata attachment to model version protobuf with repository URL, commit hash, and branch information

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
tests/runners/test_git_info.py Comprehensive test suite covering git and non-git environments, clean/dirty repository states, and metadata attachment
clarifai/runners/models/model_builder.py Core implementation adding git detection, status checking, and metadata attachment methods to ModelBuilder

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@zeiler
Copy link
Member

zeiler commented Sep 3, 2025

@copilot there are errors in the tests only on windows:

TestGitInfo.test_check_git_status_clean_repo _________________
C:\hostedtoolcache\windows\Python\3.11.9\x64\Lib\shutil.py:787: in rmtree
return _rmtree_unsafe(path, onerror)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
C:\hostedtoolcache\windows\Python\3.11.9\x64\Lib\shutil.py:629: in _rmtree_unsafe
_rmtree_unsafe(fullname, onerror)
C:\hostedtoolcache\windows\Python\3.11.9\x64\Lib\shutil.py:629: in _rmtree_unsafe
_rmtree_unsafe(fullname, onerror)
C:\hostedtoolcache\windows\Python\3.11.9\x64\Lib\shutil.py:629: in _rmtree_unsafe
_rmtree_unsafe(fullname, onerror)
C:\hostedtoolcache\windows\Python\3.11.9\x64\Lib\shutil.py:629: in _rmtree_unsafe
_rmtree_unsafe(fullname, onerror)
C:\hostedtoolcache\windows\Python\3.11.9\x64\Lib\shutil.py:634: in _rmtree_unsafe
onerror(os.unlink, fullname, sys.exc_info())


path = 'C:\Users\RUNNER~1\AppData\Local\Temp\tmp132bcxue\test_model\.git\objects\24'
onerror = <function rmtree..onerror at 0x00000218C1358680>

def _rmtree_unsafe(path, onerror):
    try:
        with os.scandir(path) as scandir_it:
            entries = list(scandir_it)
    except OSError:
        onerror(os.scandir, path, sys.exc_info())
        entries = []
    for entry in entries:
        fullname = entry.path
        if _rmtree_isdir(entry):
            try:
                if entry.is_symlink():
                    # This can only happen if someone replaces
                    # a directory with a symlink after the call to
                    # os.scandir or entry.is_dir above.
                    raise OSError("Cannot call rmtree on a symbolic link")
            except OSError:
                onerror(os.path.islink, fullname, sys.exc_info())
                continue
            _rmtree_unsafe(fullname, onerror)
        else:
            try:
              os.unlink(fullname)

E PermissionError: [WinError 5] Access is denied: 'C:\Users\RUNNER~1\AppData\Local\Temp\tmp132bcxue\test_model\.git\objects\24\ce15ab7ead32f98c7ac3edcd34bb2010ff4326

Can you fix please

Add Windows-compatible cleanup function to handle read-only files in .git directories during test teardown. Uses progressive fallback strategy: normal removal -> permission fixing -> ignore errors.

Co-authored-by: zeiler <2138258+zeiler@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Sep 3, 2025

@copilot there are errors in the tests only on windows:

TestGitInfo.test_check_git_status_clean_repo _________________
...

Fixed the Windows test failures! The issue was with test cleanup of temporary directories containing .git files, which have read-only permissions on Windows. Added a Windows-compatible cleanup function that handles permission issues with progressive fallback strategy: normal removal → permission fixing → ignore errors. All git tests now pass cleanly.

Commit: c957bc6

Copilot finished work on behalf of zeiler September 3, 2025 02:50
Copy link

github-actions bot commented Sep 3, 2025

Code Coverage

Package Line Rate Health
clarifai 43%
clarifai.cli 43%
clarifai.cli.templates 28%
clarifai.client 68%
clarifai.client.auth 67%
clarifai.constants 100%
clarifai.datasets 100%
clarifai.datasets.export 80%
clarifai.datasets.upload 75%
clarifai.datasets.upload.loaders 37%
clarifai.models 100%
clarifai.modules 0%
clarifai.rag 72%
clarifai.runners 53%
clarifai.runners.models 60%
clarifai.runners.pipeline_steps 45%
clarifai.runners.pipelines 80%
clarifai.runners.utils 62%
clarifai.runners.utils.data_types 72%
clarifai.schema 100%
clarifai.urls 60%
clarifai.utils 56%
clarifai.utils.evaluation 67%
clarifai.workflows 95%
Summary 62% (7764 / 12553)

Minimum allowed line rate is 50%

@srikanthbachala20 srikanthbachala20 merged commit 3ea1209 into master Sep 3, 2025
12 checks passed
@srikanthbachala20 srikanthbachala20 deleted the copilot/fix-761 branch September 3, 2025 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Add git registry information during model upload
3 participants