-
Notifications
You must be signed in to change notification settings - Fork 59
Description
Developers,
Thank you so much for a useful packages.
I want to ask for improve jobs handling.
The current situation
When we submit jobs with this snippet
submission = Submission(work_base="./",
machine = machine,
resources = resources,
task_list = task_list,
...)
submission.run_submission()
Assume we have 50 jobs in task_list
, if there is one of them occur runtime error, then dpdispatcher
immediately exist the connect session, and left other 49 unfinished jobs uncontrolable, we then have to handle them manually. This is inefficient.
My questions
-
Is it possible to keep the connect session and keep monitoring unfinished jobs until all finished or all errors. Then just download the finished results?
-
Or, is it possible to postpone the report of error jobs whenever all other finished jobs.
-
is it possible to disable reentry error jobs? Some jobs takes very long time before error occurs, reentry 3 times as in current code is likely wasting computing resource.
Thanks.