Difference between shutdown(wait=True) and shutdown(wait=False) #721

jan-janssen · 2025-07-13T10:05:25Z

When shutdown(wait=True) is called - the default - then the executor waits until all future objects completed. In contrast when shutdown(wait=False) is called the future objects are cancelled on the queuing system. In addition, when the jupyter notebook is interrupted the submitted futures are not deleted from the queuing system.

When `shutdown(wait=True)` is called - the default - then the executor waits until all future objects completed. In contrast when `shutdown(wait=True)` is called the future objects are cancelled on the queuing system.

for more information, see https://pre-commit.ci

coderabbitai · 2025-07-13T10:05:37Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This change removes the terminate_tasks_on_shutdown parameter and all related validation logic from several executors, input checking utilities, and the file-based task scheduler. The shutdown behavior in task execution is now only conditional on a "wait" flag, and associated tests for the removed parameter are deleted.

Changes

Files/Paths	Change Summary
executorlib/executor/flux.py, executorlib/executor/slurm.py	Removed `terminate_tasks_on_shutdown` parameter from executors and related validation logic.
executorlib/standalone/inputcheck.py	Deleted the `check_terminate_tasks_on_shutdown` function.
executorlib/task_scheduler/file/task_scheduler.py	Removed `terminate_tasks_on_shutdown` from `create_file_executor` and simplified shutdown logic.
executorlib/task_scheduler/file/shared.py	Modified shutdown logic in `execute_tasks_h5` to use a `"wait"` flag instead of the removed parameter.
tests/test_standalone_inputcheck.py	Deleted the test for `check_terminate_tasks_on_shutdown` and removed its import.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Executor
    participant TaskScheduler

    User->>Executor: Initialize (no terminate_tasks_on_shutdown)
    User->>TaskScheduler: Submit shutdown with {"wait": True/False}
    TaskScheduler-->>TaskScheduler: If "wait"==True, wait for tasks to finish
    TaskScheduler->>TaskScheduler: Terminate tasks (if applicable)
    TaskScheduler->>User: Confirm shutdown

Possibly related PRs

Upgrade terminate function #715: Added the terminate_tasks_on_shutdown parameter and validation to Flux executors, which this PR now removes.
Remove default terminate function #717: Modified create_file_executor to add an optional terminate_function parameter; both PRs change shutdown handling in the same function.

Poem

A rabbit hopped through fields of code,
Where shutdown flags once proudly strode.
With a twitch of nose, and gentle sweep,
Those old parameters fell asleep.
Now tasks await, or not, on cue—
The garden's logic, fresh and new!
🐇✨

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

jan-janssen · 2025-07-13T10:06:17Z

Correction to the changes in #715

codecov · 2025-07-13T10:07:32Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.90%. Comparing base (15f648f) to head (86514c8).
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #721      +/-   ##
==========================================
- Coverage   96.91%   96.90%   -0.02%     
==========================================
  Files          30       30              
  Lines        1363     1357       -6     
==========================================
- Hits         1321     1315       -6     
  Misses         42       42

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 48be119 and ccf6064.

📒 Files selected for processing (6)

executorlib/executor/flux.py (0 hunks)
executorlib/executor/slurm.py (0 hunks)
executorlib/standalone/inputcheck.py (0 hunks)
executorlib/task_scheduler/file/shared.py (1 hunks)
executorlib/task_scheduler/file/task_scheduler.py (1 hunks)
tests/test_standalone_inputcheck.py (0 hunks)

💤 Files with no reviewable changes (4)

tests/test_standalone_inputcheck.py
executorlib/standalone/inputcheck.py
executorlib/executor/slurm.py
executorlib/executor/flux.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)

GitHub Check: unittest_flux_openmpi
GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-mpich.yml)
GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-openmpi.yml)
GitHub Check: unittest_mpich (ubuntu-latest, 3.12)
GitHub Check: unittest_mpich (ubuntu-latest, 3.11)
GitHub Check: unittest_mpich (ubuntu-latest, 3.13)
GitHub Check: unittest_mpich (macos-latest, 3.13)
GitHub Check: unittest_old
GitHub Check: unittest_flux_mpich
GitHub Check: unittest_openmpi (ubuntu-latest, 3.12)
GitHub Check: unittest_openmpi (ubuntu-latest, 3.11)
GitHub Check: notebooks
GitHub Check: notebooks_integration
GitHub Check: unittest_openmpi (macos-latest, 3.13)
GitHub Check: unittest_openmpi (ubuntu-latest, 3.13)
GitHub Check: unittest_win
GitHub Check: minimal

🔇 Additional comments (1)

executorlib/task_scheduler/file/shared.py (1)

90-100: LGTM – no KeyError risk for "wait"
Verification shows every shutdown task dict (in tests and all queue-put calls across base.py, blockallocation.py, onetoone.py, communication.py, etc.) always includes a "wait" key. The conditional logic in shared.py is safe as written.

coderabbitai · 2025-07-13T10:07:54Z

executorlib/task_scheduler/file/task_scheduler.py

+    if execute_function != execute_in_subprocess:
        terminate_function = terminate_with_pysqa  # type: ignore
-    elif terminate_tasks_on_shutdown and execute_function == execute_in_subprocess:
-        terminate_function = terminate_subprocess  # type: ignore
    else:
-        terminate_function = None  # type: ignore
+        terminate_function = terminate_subprocess  # type: ignore


💡 Verification agent

🧩 Analysis chain

Good simplification! Note the breaking change.

The simplified logic for determining terminate_function is cleaner and more maintainable. The function selection now depends solely on the execute_function type rather than multiple conditional flags.

This removal of the terminate_tasks_on_shutdown parameter constitutes a breaking change for any external code that was using this parameter.

🏁 Script executed:

#!/bin/bash # Search for any remaining references to terminate_tasks_on_shutdown parameter rg -i "terminate_tasks_on_shutdown" # Check for any imports or function calls that might still expect this parameter ast-grep --pattern 'create_file_executor($$$, terminate_tasks_on_shutdown=$_, $$$)'

Length of output: 521

🏁 Script executed:

#!/bin/bash # Locate create_file_executor definition rg -n "def create_file_executor" -n executorlib/task_scheduler/file/task_scheduler.py # Check if terminate_tasks_on_shutdown is still referenced in this file rg -n "terminate_tasks_on_shutdown" -n executorlib/task_scheduler/file/task_scheduler.py # Print lines around the function signature for context sed -n '100,140p' executorlib/task_scheduler/file/task_scheduler.py

Length of output: 1549

Breaking Change: Remove terminate_tasks_on_shutdown References

The new create_file_executor no longer accepts or propagates a terminate_tasks_on_shutdown parameter, but several executors and tests still reference it. Please update or clean up these callers to match the simplified API:

• executorlib/executor/slurm.py

Remove the terminate_tasks_on_shutdown: bool = True parameter from the SlurmExecutor signature and any internal references.

• executorlib/executor/flux.py

Remove the terminate_tasks_on_shutdown: bool = True parameter from the FluxExecutor signature and any internal references.

• tests/test_fluxclusterexecutor.py

Drop the terminate_tasks_on_shutdown= arguments passed when instantiating executors.

Fix these to align with the new create_file_executor signature or reintroduce a no-op parameter if backward compatibility is required.

🤖 Prompt for AI Agents

In executorlib/task_scheduler/file/task_scheduler.py around lines 115 to 118, the code and related executors still reference the obsolete terminate_tasks_on_shutdown parameter which is no longer accepted by create_file_executor. Remove the terminate_tasks_on_shutdown parameter from the SlurmExecutor and FluxExecutor class signatures and their internal usage in executorlib/executor/slurm.py and executorlib/executor/flux.py. Also, update tests/test_fluxclusterexecutor.py to eliminate any terminate_tasks_on_shutdown arguments when creating executor instances to align with the simplified API.

jan-janssen · 2025-07-13T10:16:26Z

@liamhuber Once this is merged I am happy to release a new version, which hopefully allows you to complete the integration in pyiron_workflow.

liamhuber · 2025-07-14T18:47:04Z

Super! What is here looks good to me, but it only succeeds at two of the three cases: it fails at killing the process and then restarting the same job. It fails by submitting a second slurm job.

It looks to me like we need more logic in task_scheduler.file.shared around line 135 for execute_tasks_h5. The problem is clear even to me: there is an "if output h5 file not exists" clause which then simply clears any existing input h5 file and runs everything again. I think what needs to happen is

Check also for an input file, if it's not there proceed with existing code as usual
Look for the cache identifier (directory+key?) to be identified in the slurm (or flux?) queue -- if it's not there, raise an error
Follow the usual registration process for the queue to wait to unpack the result serialized to the output h5 file

The nuts and bolts are not familiar enough to me to try this myself yet though.

How I tested:

import time

from executorlib import SlurmClusterExecutor

def foo(x):
    time.sleep(x)
    return x

with SlurmClusterExecutor(
    cache_directory="cache_sce", 
    resource_dict={
        "partition": "s.cmfe", 
        "cache_key": "foo",
    }
) as executor:
    future = executor.submit(foo, 30)
    print(future)
    
print(future.result())
print(future, "\n")

Python always running
- Still works fine when I simply let it run
Restarting and recovering a finished job
- When I start the run,
- see it on the slurm queue,
- kill the notebook,
- wait for it to finish on the slurm queue (it persisted there fine),
- and then re-run the notebook:
- It runs fine, not blazing fast but way, way faster than my 30s sleep
Restarting and recovering a job still in the queue
- When I start the run,
- see it on the slurm queue,
- kill the notebook,
- restart the notebook before the slurm job is finished,
- ...it submits a second job to the slurm queue

jan-janssen · 2025-07-14T19:10:06Z

@liamhuber Good catch.

To me there are again two cases:

somebody submits a number of calculations and then realizes that the resources defined are not sufficient. So in that case the user might want to delete all the jobs and submit them again.
the alternative case is the user is happy with the setup of their calculation and they just want to check if they are already done or not.

For the first case it might be helpful to have a function to delete all the jobs for a given project. This information can be gathered from the executorlib.get_cache_data() function https://executorlib.readthedocs.io/en/latest/1-single-node.html#cache but implementing a separate delete function might be helpful.

liamhuber · 2025-07-15T00:21:10Z

@jan-janssen, thanks for the fixes! Working on #732, my test example above works for all three of my cases (stopping running jobs is also of interest to me, but it's not urgent for me and I agree with your comment that this can be done in a different PR.

In addition to above, I can handle all three cases (continuous, restart and wait until finished, and restart and re-submit while running) also with an explicitly instantiated and shutdown executor:

import time

from executorlib import SlurmClusterExecutor

def foo(x):
    time.sleep(x)
    return x

executor = SlurmClusterExecutor(
    cache_directory="cache_sce", 
    resource_dict={
        "partition": "s.cmfe", 
        "cache_key": "foo",
    }
)
future = executor.submit(foo, 10)
print(future)
executor.shutdown()
print(future.result())
print(future)

And using my integration with pyiron_workflow:

import pyiron_workflow as pwf
from pyiron_workflow.executors.wrapped_executorlib import CacheSlurmClusterExecutor


wf = pwf.Workflow("with_clfe")
wf.n1 = pwf.std.UserInput(30)
wf.n2 = pwf.std.Sleep(wf.n1)
wf.n3 = pwf.std.UserInput(wf.n2)

wf.use_cache = False
wf.n2.use_cache = False

with CacheSlurmClusterExecutor(resource_dict={"partition": "s.cmfe"}) as exe:
    wf.n2.executor = exe
    wf.run()
wf.outputs.to_value_dict()

Interestingly, what stops working for me as of this PR is late and direct instantiation of the executor, which I handle in pyiron_workflow like this:

import pyiron_workflow as pwf
from pyiron_workflow.executors.wrapped_executorlib import CacheSlurmClusterExecutor


wf = pwf.Workflow("with_clfe")
wf.n1 = pwf.std.UserInput(30)
wf.n2 = pwf.std.Sleep(wf.n1)
wf.n3 = pwf.std.UserInput(wf.n2)

wf.use_cache = False
wf.n2.use_cache = False

wf.n2.executor = (CacheSlurmClusterExecutor, (), {"resource_dict": {"partition": "s.cmfe"}})
wf.run()

wf.outputs.to_value_dict()

On the one hand, I believe it's very likely that I need to go in and be more careful in how I manage the lifetime of the executor is managed in this case. On the other hand, the way it fails is very curious to me, especially given the simplicity of the diff in this PR. What happens is that the is it possible that this is _i.h5 file and run_queue.sh content gets written, but no slurm job is ever submit. Is it possible something in my usage represents a meaningful edge case for executorlib? Again, on main the final example is working fine (at least in the case of keeping the notebook running the whole time), so the culprit is in this limited diff...

liamhuber · 2025-07-15T20:53:02Z

Interestingly, what stops working for me as of this PR is late and direct instantiation of the executor

This was resolved by being more responsible on my end: I delay the instantiation until later in my process, and since such executors are guaranteed to be used in a one-shot fashion by my code I add a shutdown call as a callback on the one-and-only future such executors are producing. All my tests pass locally and all three cases (run, shutdown and wait, shutdown and reconnect while running) are working beautifully with SLURM on the cluster.

The ability to query and kill running jobs will be useful for me going forward, but once this and #732 are merged down and released, I have everything I need for a beta release of executorlib-pyiron_workflow integration 🚀

jan-janssen · 2025-07-15T20:59:38Z

Great, then I have another careful look tomorrow and get a new release ready.

jan-janssen and others added 2 commits July 13, 2025 12:04

Difference between shutdown(wait=True) and shutdown(wait=False)

1686125

When `shutdown(wait=True)` is called - the default - then the executor waits until all future objects completed. In contrast when `shutdown(wait=True)` is called the future objects are cancelled on the queuing system.

[pre-commit.ci] auto fixes from pre-commit.com hooks

ccf6064

for more information, see https://pre-commit.ci

jan-janssen requested a review from liamhuber July 13, 2025 10:07

coderabbitai bot reviewed Jul 13, 2025

View reviewed changes

jan-janssen mentioned this pull request Jul 13, 2025

[Documentation] Explain shutdown(wait=True) and shutdown(wait=False) #723

Open

jan-janssen marked this pull request as draft July 14, 2025 19:14

Merge branch 'main' into shutdown_case

86514c8

jan-janssen mentioned this pull request Jul 14, 2025

Move cleanup of previous jobs into execute function #732

Merged

liamhuber approved these changes Jul 15, 2025

View reviewed changes

jan-janssen marked this pull request as ready for review July 16, 2025 14:09

jan-janssen merged commit ba2c702 into main Jul 16, 2025
31 checks passed

jan-janssen deleted the shutdown_case branch July 16, 2025 14:09

coderabbitai bot mentioned this pull request Jul 16, 2025

remove unused parameter terminate_tasks_on_shutdown #735

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Difference between shutdown(wait=True) and shutdown(wait=False) #721

Difference between shutdown(wait=True) and shutdown(wait=False) #721

Uh oh!

jan-janssen commented Jul 13, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Jul 13, 2025 •

edited

Loading

Review skipped

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

jan-janssen commented Jul 13, 2025

Uh oh!

codecov bot commented Jul 13, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jul 13, 2025

Uh oh!

jan-janssen commented Jul 13, 2025

Uh oh!

liamhuber commented Jul 14, 2025

Uh oh!

jan-janssen commented Jul 14, 2025

Uh oh!

liamhuber commented Jul 15, 2025

Uh oh!

liamhuber commented Jul 15, 2025

Uh oh!

jan-janssen commented Jul 15, 2025

Uh oh!

Uh oh!

Uh oh!

Difference between shutdown(wait=True) and shutdown(wait=False) #721

Difference between shutdown(wait=True) and shutdown(wait=False) #721

Uh oh!

Conversation

jan-janssen commented Jul 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Jul 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

jan-janssen commented Jul 13, 2025

Uh oh!

codecov bot commented Jul 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 13, 2025

Choose a reason for hiding this comment

Uh oh!

jan-janssen commented Jul 13, 2025

Uh oh!

liamhuber commented Jul 14, 2025

Uh oh!

jan-janssen commented Jul 14, 2025

Uh oh!

liamhuber commented Jul 15, 2025

Uh oh!

liamhuber commented Jul 15, 2025

Uh oh!

jan-janssen commented Jul 15, 2025

Uh oh!

Uh oh!

Uh oh!

jan-janssen commented Jul 13, 2025 •

edited

Loading

coderabbitai bot commented Jul 13, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov bot commented Jul 13, 2025 •

edited

Loading