Since we modified ORTE to declare that any process that terminates after calling "init" while at least one other process has not yet called "init" is an error, we have to ensure that non-MPI ORTE apps (i.e., apps that call orte_init but not mpi_init) include a barrier in orte_init. Otherwise, fast ORTE apps almost always wind up triggering the "abnormal termination" condition.
The barrier is protected with a test to ensure that MPI apps don't execute it and wind up doing two barriers during their init. This commit was SVN r22378.
Этот коммит содержится в:
родитель
ef1bfaa823
Коммит
09763ec711
@ -212,6 +212,22 @@ int orte_ess_base_app_setup(void)
|
||||
goto error;
|
||||
}
|
||||
|
||||
/* if we are an ORTE app - and not an MPI app - then
|
||||
* we need to barrier here. MPI_Init has its own barrier,
|
||||
* so we don't need to do two of them. However, if we
|
||||
* don't do a barrier at all, then one process could
|
||||
* finalize before another one called orte_init. This
|
||||
* causes ORTE to believe that the proc abnormally
|
||||
* terminated
|
||||
*/
|
||||
if (ORTE_PROC_IS_NON_MPI) {
|
||||
if (ORTE_SUCCESS != (ret = orte_grpcomm.barrier())) {
|
||||
ORTE_ERROR_LOG(ret);
|
||||
error = "orte barrier";
|
||||
goto error;
|
||||
}
|
||||
}
|
||||
|
||||
return ORTE_SUCCESS;
|
||||
|
||||
error:
|
||||
|
@ -15,7 +15,7 @@
|
||||
|
||||
int main(int argc, char* argv[])
|
||||
{
|
||||
int rc, restart;
|
||||
int rc, restart=-1;
|
||||
char hostname[512], *rstrt;
|
||||
pid_t pid;
|
||||
|
||||
|
Загрузка…
Ссылка в новой задаче
Block a user