1
1

When mpirun operates in --continuous mode, we won't terminate the job when a remote process dies. In that case, we have to activate both the waitpid _and_ the IOF complete states to ensure we properly mark the proc as dead and perform any required notifications

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Этот коммит содержится в:
Ralph Castain 2016-10-25 12:18:14 -07:00
родитель 2076622924
Коммит d031946c46

Просмотреть файл

@ -428,6 +428,13 @@ static void proc_errors(int fd, short args, void *cbdata)
if (orte_get_attribute(&jdata->attributes, ORTE_JOB_CONTINUOUS_OP, NULL, OPAL_BOOL)) {
/* always mark the waitpid as having fired */
ORTE_ACTIVATE_PROC_STATE(&pptr->name, ORTE_PROC_STATE_WAITPID_FIRED);
/* if this is a remote proc, we won't hear anything more about it
* as the default behavior would be to terminate the job. So be sure to
* mark the IOF as having completed too so we correctly mark this proc
* as dead and notify everyone as required */
if (!ORTE_FLAG_TEST(pptr, ORTE_PROC_FLAG_LOCAL)) {
ORTE_ACTIVATE_PROC_STATE(&pptr->name, ORTE_PROC_STATE_IOF_COMPLETE);
}
goto cleanup;
}