1
1

pmix/cray: abort job if using aprun for general case

It turns that there is an incompatibility between the Cray PMI
library and the default configuration for building Open MPI (master).
To work around this, we now disable use of aprun for direct launch
of Open MPI jobs except under specific conditions.

The problem is that there are now (on master) packages getting
initialized that do not work properly across a fork operation.
As part of a constructor in the Cray PMI library, a fork operation
is done to simplify use of shared memory between the
processes in a job on the same node.  This ends up thoroughly
messing up the Open MPI initialization process in the case
that dlopen support is enabled.  The initialization process gets
about half-way through when the PMIX framework is opened and
components are loaded, which triggers the Cray PMI constructor
and hence the fork operation.

There are two workarounds for this:
1) configure Open MPI for Cray XE/XC systems using aprun with the
   --disable-dlopen option
2) set the PMI_NO_FORK environment variable in the shell in which
   the aprun command is run.

Without taking these measures, a Open MPI job will just hang at
job startup in the first attempt to "thread-shift" the PMIx
fence_nb operation.  Additional hangs occur at shutdown if this
problem is worked around, again due to the insertion of a fork
operation halfway through the Open MPI initialization procedure.

This commit detects if the conditions that bring out the hang
situation are present, and if so, prints out a message and
aborts the job launch.

Note on systems using slurm, the PMI_NO_FORK environment variable
is set as part of the srun job launch, hence this issue is avoided
on those systems.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Этот коммит содержится в:
Howard Pritchard 2016-11-24 21:43:52 -07:00
родитель 0c8359b0b9
Коммит eee9f7ae3a
3 изменённых файлов: 42 добавлений и 0 удалений

Просмотреть файл

@ -9,6 +9,8 @@
# $HEADER$
#
dist_opaldata_DATA = help-pmix-cray.txt
sources = \
pmix_cray.h \
pmix_cray_component.c \

17
opal/mca/pmix/cray/help-pmix-cray.txt Обычный файл
Просмотреть файл

@ -0,0 +1,17 @@
-*- text -*-
#
# Copyright (c) 2010 Cisco Systems, Inc. All rights reserved.
# Copyright (c) 2016 Los Alamos National Security, LLC. All rights
# reserved.
#
# $COPYRIGHT$
#
# Additional copyrights may follow
#
# $HEADER$
#
# This is the US/English general help file for OPAL PMIX Cray module.
#
[aprun-not-supported]
Direct launch with aprun only works when either the PMI_NO_FORK environment
variable is set, or Open MPI is built with dlopen support disabled.

Просмотреть файл

@ -21,6 +21,7 @@
#include "opal/constants.h"
#include "opal/mca/pmix/pmix.h"
#include "opal/util/show_help.h"
#include "pmix_cray.h"
#include <sys/syscall.h>
#include <pmi.h>
@ -79,6 +80,28 @@ opal_pmix_cray_component_t mca_pmix_cray_component = {
static int pmix_cray_component_open(void)
{
/*
* Turns out that there's a lot of reliance on libevent
* and the default behavior of Cray PMI to fork
* in a constructor breaks libevent.
*
* Open MPI will not launch correctly on Cray XE/XC systems
* under these conditions:
*
* 1) direct launch using aprun, and
* 2) PMI_NO_FORK env. variable is not set, nor was
* 3) --disable-dlopen used as part of configury
*
* Under SLURM, PMI_NO_FORK is always set, so we can combine
* the check for conditions 1) and 2) together
*/
#if OPAL_ENABLE_DLOPEN_SUPPORT
if (NULL == getenv("PMI_NO_FORK")) {
opal_show_help("help-pmix-cray.txt", "aprun-not-supported", true);
exit(-1);
}
#endif
return OPAL_SUCCESS;
}