-
Notifications
You must be signed in to change notification settings - Fork 3
Wrap executorlib executors #678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
To exploit the new caching interface so that nodes can rely on their running state and lexical path to access previously executed results. Locally with the SingleNodeExecutor everything is looking good, but that doesn't natively support terminating the process that submit the job. I'd like to play around with this using the SlurmClusterExecutor on the cluster before making further changes. Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Codecov ReportAttention: Patch coverage is
❌ Your patch status has failed because the patch coverage (88.88%) is below the target coverage (95.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #678 +/- ##
==========================================
- Coverage 92.11% 92.05% -0.07%
==========================================
Files 33 34 +1
Lines 3665 3725 +60
==========================================
+ Hits 3376 3429 +53
- Misses 289 296 +7 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Since we now depend explicitly on a new feature Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
It causes a weird hang that blocks observability. Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
And make the expected file independently accessible Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
And hide it behind its own boolean flag for testing Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
And make the file name fixed and accessible at the class level Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
The slurm executor populates this with a submission script, etc. Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
From @jan-janssen in [this comment](pyiron/executorlib#708 (comment)) Co-authored-by: Jan Janssen Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
This is currently working to all attempted behaviour when run together with pyiron/executorlib#712 |
Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
The local file executor got directly included in executorlib as a testing tool. Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
And always with-execute tuples since there is only ever one instance of this executor. If we have already been assigned an executor _instance_ then we trust the user to be managing its state and submit directly rather than wrapping in a with-clause Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Recent changes threw off the balance of times in the first vs second run, so rather compare to what you actually care about: that the second run is bypassing the sleep call. Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Instead of a with-clause. This way the executor is still permitted to release the thread before the job is done, but we still guarantee that executors created by bespoke instructions get shutdown at the end of their one-future lifetime. Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
There was necessarily only the one future, so don't wait at shutdown. This removes the need for accepting the runtime error and prevents the wrapped executorlib executors from hanging indefinitely. Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Waiting on |
Together with pyiron/executorlib#732 this is working very nicely on the cluster now. I can
and in all cases everything runs perfectly smoothly. I.e. I can start with this: import pyiron_workflow as pwf
from pyiron_workflow.executors.wrapped_executorlib import CacheSlurmClusterExecutor
wf = pwf.Workflow("executor_test")
wf.n1 = pwf.std.UserInput(20)
wf.n2 = pwf.std.Sleep(wf.n1)
wf.n3 = pwf.std.UserInput(wf.n2)
wf.n2.executor = (CacheSlurmClusterExecutor, (), {"resource_dict": {"partition": "s.cmfe"}})
wf.run() And then either let it run, or restart the kernel and follow up with this after the appropriate delay for the case I'm interested in: import pyiron_workflow as pwf
from pyiron_workflow.executors.wrapped_executorlib import CacheSlurmClusterExecutor
wf = pwf.Workflow("executor_test")
wf.load(filename=wf.label + "/recovery.pckl")
wf.failed = False
wf.use_cache = False
wf.run() Outside the scope of this PR but on the TODO list is:
|
Including the lower bound Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
And debug the error message Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Since that's the way users will typically interact with this field. I also had to change the inheritance order to make sure we were dealing with the user-facing executor and not the task scheduler, but this doesn't impact the submit loop. Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
So we pass throught the Runnable._shutdown_executor_callback process Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Codecov complains that |
Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
Actually, I'd be more comfortable with this PR if it included these. I'll still leave exposure in the API for later, but let's take a crack at robust testing right here. |
* Test slurm submission Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Don't apply callbacks to cached returns Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Only validate submission-time resources Otherwise we run into trouble where it loads saved executor instructions (that already have what it would use anyhow) Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Mark module Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Test cached result branch Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> --------- Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>
To exploit the new caching interface provided in
executorlib-1.5.0
so that nodes can rely on their running state and lexical path to access previously executed results.Locally with the
SingleNodeExecutor
everything is looking good, but that doesn't natively support terminating the process that submit the job. I'd like to play around with this using theSlurmClusterExecutor
on the cluster before making further changes.