-
Notifications
You must be signed in to change notification settings - Fork 23
MLFlow: updated logic for creating compiled.json
#1274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Updated get_job, _compile_metrics and _compile_parameters Add exception for workflow run not found Added unit tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you for the logic simplification and improvement 💪 . I think the frontend code may need to be adjusted to accommodate this change, but I don't think anything needs to change in the SDK 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about why the get_job was turned into an async function, but other than that no issues. 😄
"""Delete an experiment. Although Mflow has a delete_experiment method, | ||
We will use the functions of this class instead, so that we make sure we correctly | ||
clean up all the artifacts/runs/etc. associated with the experiment. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What code calls get_job, and what is the impact of changing this to async?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey hey 😄, good question, let me try to explain the reasoning/history:
This is to align with main
as the interface has been updated to make all the methods async.
Question: "Why do all the tracking interface methods need to be async?"
Answer: Currently, they don't ... but ...
Background:
In changing the implementation of the S3 provider from boto3
to s3fs
, we had to call some async library methods to have similar functionality.
Initially we could just change the implementation of MLflowTrackingClient
to use an async
method, but the idea of the tracking client interface is that code can accept it as a parameter for dependency injection and 'not care' about implementation details.
In this case, the TrackingClient
wouldn't suggest that consuming code should await
anything it does, which will lead to some interesting return values.
Because we didn't want that, we updated the interface and the implementation to mark the methods in question as async
, this then meant that the methods that made calls to the interface method, became async
, and so on up the chain.
(Side note: an alternative is trying to semi-manually handle the async stuff in sync code using asyncio
calls (like run
) but I think they come with their own foot guns, and run
is usually to bootstrap the entry-point into the async app, so I say just use the native language syntax to specify intention in the functions even if that means it pervades the calls up the stack)
Question: "So what does that have to do with non-async methods now being async
?"
Answer: TL;DR:
- Future proofing code refactoring
- Allows underlying code to be sync or async
- Type hinting for the IDE to remind people to await things
- Interface doesn't lie about implementations
There's a small overhead of adding async
/await
to code/calls but it's negligible vs the benefits. (I believe)
Because we saw some of the pain in making that change, having to find all the call sites and update them to be async, up the stack.. It makes sense to pre-emptively mark the other methods as async.
Hope this helps!
What's changing
After discussions with @njbrake we decided that the structure of the
compiled.json
file probably needs to be adjusted in order to allow users to see metrics/parameters 'per job' that runs.This PR changes the way this is done so that the compiled results store the job (run) ID as the key along with the results for that job as the values.
TODO: Still work in progress, we will need to make changes to the front end etc. to support this.
Refs #...
Closes #...
How to test it
Steps to test the changes:
Additional notes for reviewers
Anything you'd like to add to help the reviewer understand the changes you're proposing.
I already...
/docs
)