-
Notifications
You must be signed in to change notification settings - Fork 4
Difference between shutdown(wait=True) and shutdown(wait=False) #721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When `shutdown(wait=True)` is called - the default - then the executor waits until all future objects completed. In contrast when `shutdown(wait=True)` is called the future objects are cancelled on the queuing system.
for more information, see https://pre-commit.ci
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis change removes the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Executor
participant TaskScheduler
User->>Executor: Initialize (no terminate_tasks_on_shutdown)
User->>TaskScheduler: Submit shutdown with {"wait": True/False}
TaskScheduler-->>TaskScheduler: If "wait"==True, wait for tasks to finish
TaskScheduler->>TaskScheduler: Terminate tasks (if applicable)
TaskScheduler->>User: Confirm shutdown
Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Correction to the changes in #715 |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #721 +/- ##
==========================================
- Coverage 96.91% 96.90% -0.02%
==========================================
Files 30 30
Lines 1363 1357 -6
==========================================
- Hits 1321 1315 -6
Misses 42 42 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
executorlib/executor/flux.py
(0 hunks)executorlib/executor/slurm.py
(0 hunks)executorlib/standalone/inputcheck.py
(0 hunks)executorlib/task_scheduler/file/shared.py
(1 hunks)executorlib/task_scheduler/file/task_scheduler.py
(1 hunks)tests/test_standalone_inputcheck.py
(0 hunks)
💤 Files with no reviewable changes (4)
- tests/test_standalone_inputcheck.py
- executorlib/standalone/inputcheck.py
- executorlib/executor/slurm.py
- executorlib/executor/flux.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
- GitHub Check: unittest_flux_openmpi
- GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-mpich.yml)
- GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-openmpi.yml)
- GitHub Check: unittest_mpich (ubuntu-latest, 3.12)
- GitHub Check: unittest_mpich (ubuntu-latest, 3.11)
- GitHub Check: unittest_mpich (ubuntu-latest, 3.13)
- GitHub Check: unittest_mpich (macos-latest, 3.13)
- GitHub Check: unittest_old
- GitHub Check: unittest_flux_mpich
- GitHub Check: unittest_openmpi (ubuntu-latest, 3.12)
- GitHub Check: unittest_openmpi (ubuntu-latest, 3.11)
- GitHub Check: notebooks
- GitHub Check: notebooks_integration
- GitHub Check: unittest_openmpi (macos-latest, 3.13)
- GitHub Check: unittest_openmpi (ubuntu-latest, 3.13)
- GitHub Check: unittest_win
- GitHub Check: minimal
🔇 Additional comments (1)
executorlib/task_scheduler/file/shared.py (1)
90-100
: LGTM – no KeyError risk for"wait"
Verification shows every shutdown task dict (in tests and all queue-put calls acrossbase.py
,blockallocation.py
,onetoone.py
,communication.py
, etc.) always includes a"wait"
key. The conditional logic inshared.py
is safe as written.
if execute_function != execute_in_subprocess: | ||
terminate_function = terminate_with_pysqa # type: ignore | ||
elif terminate_tasks_on_shutdown and execute_function == execute_in_subprocess: | ||
terminate_function = terminate_subprocess # type: ignore | ||
else: | ||
terminate_function = None # type: ignore | ||
terminate_function = terminate_subprocess # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Good simplification! Note the breaking change.
The simplified logic for determining terminate_function
is cleaner and more maintainable. The function selection now depends solely on the execute_function
type rather than multiple conditional flags.
This removal of the terminate_tasks_on_shutdown
parameter constitutes a breaking change for any external code that was using this parameter.
🏁 Script executed:
#!/bin/bash
# Search for any remaining references to terminate_tasks_on_shutdown parameter
rg -i "terminate_tasks_on_shutdown"
# Check for any imports or function calls that might still expect this parameter
ast-grep --pattern 'create_file_executor($$$, terminate_tasks_on_shutdown=$_, $$$)'
Length of output: 521
🏁 Script executed:
#!/bin/bash
# Locate create_file_executor definition
rg -n "def create_file_executor" -n executorlib/task_scheduler/file/task_scheduler.py
# Check if terminate_tasks_on_shutdown is still referenced in this file
rg -n "terminate_tasks_on_shutdown" -n executorlib/task_scheduler/file/task_scheduler.py
# Print lines around the function signature for context
sed -n '100,140p' executorlib/task_scheduler/file/task_scheduler.py
Length of output: 1549
Breaking Change: Remove terminate_tasks_on_shutdown
References
The new create_file_executor
no longer accepts or propagates a terminate_tasks_on_shutdown
parameter, but several executors and tests still reference it. Please update or clean up these callers to match the simplified API:
• executorlib/executor/slurm.py
- Remove the
terminate_tasks_on_shutdown: bool = True
parameter from the SlurmExecutor signature and any internal references.
• executorlib/executor/flux.py
- Remove the
terminate_tasks_on_shutdown: bool = True
parameter from the FluxExecutor signature and any internal references.
• tests/test_fluxclusterexecutor.py
- Drop the
terminate_tasks_on_shutdown=
arguments passed when instantiating executors.
Fix these to align with the new create_file_executor
signature or reintroduce a no-op parameter if backward compatibility is required.
🤖 Prompt for AI Agents
In executorlib/task_scheduler/file/task_scheduler.py around lines 115 to 118,
the code and related executors still reference the obsolete
terminate_tasks_on_shutdown parameter which is no longer accepted by
create_file_executor. Remove the terminate_tasks_on_shutdown parameter from the
SlurmExecutor and FluxExecutor class signatures and their internal usage in
executorlib/executor/slurm.py and executorlib/executor/flux.py. Also, update
tests/test_fluxclusterexecutor.py to eliminate any terminate_tasks_on_shutdown
arguments when creating executor instances to align with the simplified API.
@liamhuber Once this is merged I am happy to release a new version, which hopefully allows you to complete the integration in |
Super! What is here looks good to me, but it only succeeds at two of the three cases: it fails at killing the process and then restarting the same job. It fails by submitting a second slurm job. It looks to me like we need more logic in
The nuts and bolts are not familiar enough to me to try this myself yet though. How I tested: import time
from executorlib import SlurmClusterExecutor
def foo(x):
time.sleep(x)
return x
with SlurmClusterExecutor(
cache_directory="cache_sce",
resource_dict={
"partition": "s.cmfe",
"cache_key": "foo",
}
) as executor:
future = executor.submit(foo, 30)
print(future)
print(future.result())
print(future, "\n")
|
@liamhuber Good catch. To me there are again two cases:
For the first case it might be helpful to have a function to delete all the jobs for a given project. This information can be gathered from the |
@jan-janssen, thanks for the fixes! Working on #732, my test example above works for all three of my cases (stopping running jobs is also of interest to me, but it's not urgent for me and I agree with your comment that this can be done in a different PR. In addition to above, I can handle all three cases (continuous, restart and wait until finished, and restart and re-submit while running) also with an explicitly instantiated and shutdown executor: import time
from executorlib import SlurmClusterExecutor
def foo(x):
time.sleep(x)
return x
executor = SlurmClusterExecutor(
cache_directory="cache_sce",
resource_dict={
"partition": "s.cmfe",
"cache_key": "foo",
}
)
future = executor.submit(foo, 10)
print(future)
executor.shutdown()
print(future.result())
print(future) And using my integration with import pyiron_workflow as pwf
from pyiron_workflow.executors.wrapped_executorlib import CacheSlurmClusterExecutor
wf = pwf.Workflow("with_clfe")
wf.n1 = pwf.std.UserInput(30)
wf.n2 = pwf.std.Sleep(wf.n1)
wf.n3 = pwf.std.UserInput(wf.n2)
wf.use_cache = False
wf.n2.use_cache = False
with CacheSlurmClusterExecutor(resource_dict={"partition": "s.cmfe"}) as exe:
wf.n2.executor = exe
wf.run()
wf.outputs.to_value_dict() Interestingly, what stops working for me as of this PR is late and direct instantiation of the executor, which I handle in import pyiron_workflow as pwf
from pyiron_workflow.executors.wrapped_executorlib import CacheSlurmClusterExecutor
wf = pwf.Workflow("with_clfe")
wf.n1 = pwf.std.UserInput(30)
wf.n2 = pwf.std.Sleep(wf.n1)
wf.n3 = pwf.std.UserInput(wf.n2)
wf.use_cache = False
wf.n2.use_cache = False
wf.n2.executor = (CacheSlurmClusterExecutor, (), {"resource_dict": {"partition": "s.cmfe"}})
wf.run()
wf.outputs.to_value_dict() On the one hand, I believe it's very likely that I need to go in and be more careful in how I manage the lifetime of the executor is managed in this case. On the other hand, the way it fails is very curious to me, especially given the simplicity of the diff in this PR. What happens is that the is it possible that this is |
This was resolved by being more responsible on my end: I delay the instantiation until later in my process, and since such executors are guaranteed to be used in a one-shot fashion by my code I add a The ability to query and kill running jobs will be useful for me going forward, but once this and #732 are merged down and released, I have everything I need for a beta release of |
Great, then I have another careful look tomorrow and get a new release ready. |
When
shutdown(wait=True)
is called - the default - then the executor waits until all future objects completed. In contrast whenshutdown(wait=False)
is called the future objects are cancelled on the queuing system. In addition, when the jupyter notebook is interrupted the submitted futures are not deleted from the queuing system.