1
1
openmpi/orte
Joshua Hursey c6fab32137
Fix the sigkill timeout sleep to prevent SIGCHLD from preventing completion.
* The user can set `-mca odls_base_sigkill_timeout 30` to have ORTE wait
   30 seconds before sending SIGTERM then another 30 seconds before sending
   SIGKILL to remaining processes. This usually happens on an abnormal
   termination. Sometimes the user wants to delay the cleanup to give the
   system time to write out corefile or run other diagnostics.
 * The problem is that child processes may be completing while ORTE is
   in this loop. The SIGCHLD will interrupt the `sleep` system call.
   Without the loop the sleep could effectively be ignored in this case.
   - Sleep returns the amount of time remaining to sleep. If it was
     interrupted by a signal then it is a positive number less than or
     equal to the parameter passed to it. If it slept the whole time
     then it returns 0.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
(cherry picked from commit 0e8a97c598)
2019-10-02 14:49:47 -05:00
..
bindings Expose opal_set_using_threads and improve error message on missing ompi_info. 2017-01-19 07:57:58 -05:00
etc Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
include Complete job control integration 2018-08-20 16:08:54 -07:00
mca Fix the sigkill timeout sleep to prevent SIGCHLD from preventing completion. 2019-10-02 14:49:47 -05:00
orted Add 'orte_' prefix to noop_mpir_breakpoint_ptr. 2019-09-19 08:47:17 -04:00
runtime Add a compilation flag that adds unwind info to all files that are present in the stack starting from MPI_Init. 2019-04-01 11:10:04 +01:00
test regx: fixed the order of hosts for ranges with different prefixes 2019-02-11 12:06:49 +02:00
tools orterun: remove duplicate code 2019-08-19 15:49:57 -04:00
util Remove stale ORTE code 2019-03-31 11:26:18 -07:00
common_sym_whitelist.txt common syms: whitelist bison-generated common symbols 2016-01-16 03:53:14 -08:00
Doxyfile Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
Makefile.am Purge whitespace from the repo 2015-06-23 20:59:57 -07:00