MLFlow: updated logic for creating `compiled.json` #1274

peteski22 · 2025-03-21T18:49:24Z

What's changing

After discussions with @njbrake we decided that the structure of the compiled.json file probably needs to be adjusted in order to allow users to see metrics/parameters 'per job' that runs.

This PR changes the way this is done so that the compiled results store the job (run) ID as the key along with the results for that job as the values.

Updated get_job, _compile_metrics and _compile_parameters
Add exception for workflow run not found
Added unit tests

TODO: Still work in progress, we will need to make changes to the front end etc. to support this.

If this PR is related to an issue or closes one, please link it here.

Refs #...
Closes #...

How to test it

Steps to test the changes:

Additional notes for reviewers

Anything you'd like to add to help the reviewer understand the changes you're proposing.

I already...

Tested the changes in a working environment to ensure they work as expected
Added some tests for any new functionality
Updated the documentation (both comments in code and product documentation under /docs)
Checked if a (backend) DB migration step was required and included it if required

Updated get_job, _compile_metrics and _compile_parameters Add exception for workflow run not found Added unit tests

njbrake

Looks good! Thank you for the logic simplification and improvement 💪 . I think the frontend code may need to be adjusted to accommodate this change, but I don't think anything needs to change in the SDK 👍

…zation

njbrake

I'm not sure about why the get_job was turned into an async function, but other than that no issues. 😄

njbrake · 2025-03-28T00:26:40Z

lumigator/backend/backend/tracking/mlflow.py

        """Delete an experiment. Although Mflow has a delete_experiment method,
        We will use the functions of this class instead, so that we make sure we correctly
        clean up all the artifacts/runs/etc. associated with the experiment.
        """


What code calls get_job, and what is the impact of changing this to async?

Hey hey 😄, good question, let me try to explain the reasoning/history:

This is to align with main as the interface has been updated to make all the methods async.

Question: "Why do all the tracking interface methods need to be async?"

Answer: Currently, they don't ... but ...

Background:

In changing the implementation of the S3 provider from boto3 to s3fs, we had to call some async library methods to have similar functionality.

Initially we could just change the implementation of MLflowTrackingClient to use an async method, but the idea of the tracking client interface is that code can accept it as a parameter for dependency injection and 'not care' about implementation details.

In this case, the TrackingClient wouldn't suggest that consuming code should await anything it does, which will lead to some interesting return values.

Because we didn't want that, we updated the interface and the implementation to mark the methods in question as async, this then meant that the methods that made calls to the interface method, became async, and so on up the chain.

(Side note: an alternative is trying to semi-manually handle the async stuff in sync code using asyncio calls (like run) but I think they come with their own foot guns, and run is usually to bootstrap the entry-point into the async app, so I say just use the native language syntax to specify intention in the functions even if that means it pervades the calls up the stack)

Question: "So what does that have to do with non-async methods now being async?"

Answer: TL;DR:

Future proofing code refactoring

Allows underlying code to be sync or async

Type hinting for the IDE to remind people to await things

Interface doesn't lie about implementations

There's a small overhead of adding async/await to code/calls but it's negligible vs the benefits. (I believe)

Because we saw some of the pain in making that change, having to find all the call sites and update them to be async, up the stack.. It makes sense to pre-emptively mark the other methods as async.

Hope this helps!

MLFlow: updated logic for creating compiled.json

5b651a5

Updated get_job, _compile_metrics and _compile_parameters Add exception for workflow run not found Added unit tests

peteski22 requested a review from njbrake March 21, 2025 18:49

github-actions bot added the backend label Mar 21, 2025

Merge branch 'main' into mlflow/refactor/compiled-structure

b27ecab

njbrake reviewed Mar 22, 2025

View reviewed changes

peteski22 added 6 commits March 25, 2025 07:47

Merge branch 'main' into mlflow/refactor/compiled-structure

4f7ffb4

Small tweaks

2c07d0d

Merge branch 'main' into mlflow/refactor/compiled-structure

bb5a090

Merge branch 'main' into mlflow/refactor/compiled-structure

86a3124

Ignore jobs that are not found, or deleted. Fix compiled.json seriali…

8734661

…zation

Merge branch 'main' into mlflow/refactor/compiled-structure

ea0ceb6

njbrake reviewed Mar 28, 2025

View reviewed changes

Merge branch 'main' into mlflow/refactor/compiled-structure

c795097

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MLFlow: updated logic for creating `compiled.json` #1274

MLFlow: updated logic for creating `compiled.json` #1274

Uh oh!

peteski22 commented Mar 21, 2025

Uh oh!

njbrake left a comment

Uh oh!

njbrake left a comment

Uh oh!

njbrake Mar 28, 2025

Uh oh!

peteski22 Mar 28, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MLFlow: updated logic for creating compiled.json #1274

Are you sure you want to change the base?

MLFlow: updated logic for creating compiled.json #1274

Uh oh!

Conversation

peteski22 commented Mar 21, 2025

What's changing

How to test it

Additional notes for reviewers

I already...

Uh oh!

njbrake left a comment

Choose a reason for hiding this comment

Uh oh!

njbrake left a comment

Choose a reason for hiding this comment

Uh oh!

njbrake Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

peteski22 Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MLFlow: updated logic for creating `compiled.json` #1274

MLFlow: updated logic for creating `compiled.json` #1274

peteski22 Mar 28, 2025 •

edited

Loading